Jenkins - Read Labels From ECR Images

I’ve been working on converting some of our Java based applications to run in Docker containers in an AWS ECS cluster. With our existing pipelines for these applications, we store metadata from the build as properties on the build artifact in Artifactory. This includes information like the git revision associated with the build, the branch it was pulled from, etc etc. In our transition to Docker, we aren’t using Artifactory for image storage. Instead, each application ends up with an ECR repository. Unfortunately, said repositories do not provide a key-value metadata storage mechanism beyond standard docker tags (which to me is a compelling reason to not use ECR, but I digress).

To counter this lack of functionality, we are instead storing this information as Labels within the image itself. This works fine for the most part. Where it breaks down a bit is when we want to create a tag in Git for a particular version. This means retrieving that label value from ECR. But at this stage in the pipeline, our Jenkins executor might not have the image downloaded locally to run a docker inspect operation. Performing that download is an option, but I had hoped to avoid the unnecessary overhead. Which is how we arrived at this hack/solution.

Preconditions - Your Jenkins executor will need to have jq, the aws CLI, and curl installed. You will need some tolerance for hackery and an understanding of Groovy string interpolation and the Jenkins scripted pipeline DSL.

The code layout is contorted somewhat to make it (arguably) more readable. Account numbers have been changed to protect the innocent.

/*
 * Retrieves the Git revision hash from the 'git_revision' label of the provided
 * docker application image.
 */
def readGitRevisionFromImage(String name, String version){

    env.AWS_DEFAULT_REGION = 'us-east-1'

    def token = sh(returnStdout: true, script:
       """
       aws ecr get-authorization-token | jq -r '.authorizationData[].authorizationToken'
       """
       ).trim()

    def digest = sh(returnStdout: true, script:
        """
        curl -H \"Authorization: Basic ${token}\" \
            https://000000000000.dkr.ecr.us-east-1.amazonaws.com/v2/${name}/manifests/${version} \
            | jq -r .config.digest
        """
        ).trim()

    def url = sh(returnStdout: true, script:
        """
        aws ecr get-download-url-for-layer --registry-id 000000000000 \
            --repository-name ${name} --layer-digest ${digest} | jq -r .downloadUrl
        """).trim()

    def revision = sh(returnStdout: true, script:
        """
        curl -s --connect-timeout 5 '${url}' | jq .config.Labels.git_revision
        """).trim()

    return revision
}

The steps involved are:

  1. Get an authorization token for invoking the docker API on our image. The jq invocation in this step extracts out the 2k+ alphanumeric authentication string. The use of the CLI --query option could also return you the raw data without using jq.
  2. Use said token in a query to get the manifest and extract the digest for the config layer. Note that the URL here is hard-coded. It doesn’t have to be. The previous call to get an authorization token also returns this endpoint. The jq invocation in this step grabs the sha256 digest for the top-level docker ‘config’ layer.
  3. With that layer identifier, we then get the download URL for the config layer.
  4. Finally, we use that URL to pull down the layer metadata and extract the .config.Labels.git_revision data from the JSON.

At that point, we have the label value for the git revision and we can proceed with tagging. One obvious enhancement here is to extract the more general method that accepts the name of the label as an argument. We performed this refactoring in the final version of our pipeline library. Here, I leave it as an exercise to the reader.