CI/CD Optimization: Docker, Kaniko, Buildah and failed ARM switch

I’m striving to cut costs we are spending on our AWS resources. I’ve already done that to our RDS and I might cover that in another article. Currently I’m working on budget optimisations of our EC2 resources. That would help us reduce our environmental footprint as well 🌿 So, switching to ARM architecture seems like a promising solution. As the title suggests, I haven’t achieved that yet, as it has emerged as a bit more complex task. To switch entirely, I will need to set up and spin up executors on ARM instances within our K8s cluster (so there is potentially content for another blogpost 🤓).

Looking for a new approach

Given the task of setting up a fresh environment for a new supporting project I’ve decided to upgrade CI by improving pipelines.

Started with discovering modern approaches I came across an article discussing the utilization of Bake to enhance GitHub Actions with exporting cache layers. In this article, the author shares insights on achieving 6x Faster Docker Builds for Symfony and API Platform Projects.

While I use GitLab CI instead of GitHub Actions, the principles discussed in the article aren’t directly applicable to my situation. Though, the concept of exporting cache layers, as discussed in the article above, looks interesting and could potentially optimize build times.

Also, since I set out rewriting some bits of CI/CD I thought about switching to ARM instances at EC2 at the same time. That’s just switching architecture from x86-64 to ARM. Must be simple, right? 🤔

Kaniko

During my research I’ve discovered Kaniko (which has some relation to Google but as they say in readme “kaniko is not an officially supported Google product”). I found out that it supports exporting layers, which aligns with the optimization strategy I’m aiming at. Here’s how basic Gitlab CI job description would look like:

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.14.0-debug
    entrypoint: [""]
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

See more details: https://docs.gitlab.com/ee/ci/docker/using_kaniko.html https://github.com/GoogleContainerTools/kaniko

However, I encountered following problems with it. Despite its support for the --custom-platform flag, upon closer examination of the documentation, I came across this note on using mentioned flag:

This is not virtualization and cannot help to build an architecture not natively supported by the build host. This is used to build i386 on an amd64 Host for example, or arm32 on an arm64 host.

For these supporting projects I use GitLab SaaS shared runners and these GitLab’s runners are only available on Linux (x86-64) architecture. So, utilising Kaniko may not be feasible for my current situation.

Another issue I encountered with Kaniko was its inability to handle secrets in the same way Docker does. For me it is a crucial part to provide security to the projects I am working on. In simple terms, Docker makes it easy to securely manage sensitive information, like passwords or API keys, within the build process (examples from official documentation):

RUN --mount=type=secret,id=mytoken \
    TOKEN=$(cat /run/secrets/mytoken) ...

docker build --secret id=mytoken,src=$HOME/.aws/credentials .

There is a corresponding issue in kaniko’s repository which addresses this limitation and it is open at the time of this writing. Someone in that thread actually mentioned that there is workaround with mounting the secret in CI as a file to /kaniko directory as files in the /kaniko directory are ignored during the image build (see comment). I tried that approach:

...
script:  
  - echo "${STAGING_DECRYPTION_SECRET}" > /kaniko/decryption_key  
  - /kaniko/executor  
    --context "${CI_PROJECT_DIR}/backend"  
...

but it appears, in this case secret is revealed in Gitlab Pipeline output which does not make it much of a secret really. Luckily, in my case I was able to replace the secret with build-arg, cause in this case it is hidden in pipeline and I am not worried that it can appear in the final image.

script:  
  - /kaniko/executor  
    --context "${CI_PROJECT_DIR}/backend"  
    --build-arg DECRYPTION_SECRET_VAR="${STAGING_DECRYPTION_SECRET}"
...

But nevertheless, inability to build for desired for ARM forced me to look for another solution.

Note: While in my situation it is completely safe to pass secret as build-arg I do not recommend this approach. You should rely on secrets in a first place.

Buildah

After some research on Reddit, I stumbled upon Buildah, which seems like another promising solution. What makes Buildah appealing is that it is:

Aligned with the Open Containers Initiative
Capable of supporting a distributed caching mechanism, which ensures that if a build fails midway, cached previous steps can still be utilized, thereby optimizing the build process.
Flexible in supporting custom architectures, making it suitable for the transition towards ARM architecture

In GitLab’s documentation, there’s a tutorial on using Buildah in a rootless environment, which could potentially address my needs. However, it mentions that a “runner already deployed to a gitlab-runner namespace” is required. While I haven’t attempted this yet, it’s worth investigating whether the SaaS runner supports this functionality.

Given the urgency to provide my team with a functioning environment without causing any slowdowns, I set Buildah aside as well as a future option to explore further when time permits. and I’ve reverted to using Docker in Docker (DinD).

Docker in Docker

The old way of building images in CI pipeline was to fetch the image every time, but realizing the advantages of caching layers, I started looking into caching strategies to avoid fetching the image repeatedly. To deal with slow build times, I dug deeper and found out about the cache-to and cache-from flags.

Enabling BUILDKIT or utilizing buildx (the plugin which gives you build capabilities with BuildKit) could facilitate the implementation of caching strategies without the need to pull the image repeatedly. BuildKit is truly amazing as it enables you with a lot of other capabilites such as automatic variables TARGETARCH and TARGETOS mentioned in Dunglas’ article as well.

In my pipeline, I utilize docker buildx with specific parameters:

script:  
  - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY  
  - docker buildx create --use  
  - >
      docker buildx build -f ./backend/Dockerfile.staging
      -t $IMAGE_BACK_STAGING-app
      --target app_staging
      --cache-to type=registry,ref=$IMAGE_BACK_STAGING-app-cache,mode=max
      --cache-from type=registry,ref=$IMAGE_BACK_STAGING-app-cache

It’s fortunate that Docker buildx supports the type=registry parameter out of the box, allowing us to leverage caching mechanisms effectively. If you’re considering implementing a similar approach in your pipelines, I recommend familiarising yourself with the modifiers available for these parameters in the documentation, particularly the mode=max parameter, which instructs buildx to store all layers in the cache. By default, only final images are stored, so this modifier ensures that all layers are cached, optimising the build process.

But then I found out this behaviour - when something breaks your build (for example error in Dockerfile instruction, even on a latest stage), buildx would not store any layers, you image must be completely built. Otherwise, all previous layers are lost and next build starts from the very beginning.

To be fair, this was not a big deal for me as making errors in Dockerfile is not in my plan. Nevertheless, having done all of that I was still not satisfied with the result. The build was still slow enough, so I thought I will give a go switching for ARM instance next time.

Final setup

Eventually, I thought - if I am using build-args which are supported by kaniko and also building for x86-64, there is nothing that limits me from using kaniko.

This is how I ended up using command:

build:backend:staging:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.14.0-debug
    entrypoint: [""]
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}/backend"
      --dockerfile "${CI_PROJECT_DIR}/backend/Dockerfile"
      --destination "${IMAGE_BACK_STAGING}:latest"
      --target app_prod
      --skip-unused-stages
      --build-arg DECRYPTION_SECRET_VAR="${STAGING_DECRYPTION_SECRET}"
      --build-arg APP_ENV_VAR="staging"
      --cache=true
      --cache-copy-layers=true
      --cache-ttl=24h

Layers are cached automatically and builds are really quick. Also, another great thing about using kaniko you don’t have to write in your script explicit login to docker registry (at least when you’re using Gitlab container registry), which makes your script a bit more cleaner and readable.

Conclusion

Sometimes even to get the simplest solution you have to explore every corner and gather all the information needed. Important thing is to learn as much as you can during that journey.

Bonus notes

When using kaniiko don’t forget to --skip-unused-stages - otherwise multi-stage Dockerfile builded everything
Having multiple environments you might want to include some specific files for one and exclude for another. For that thing you can rely on prefixes for your Dockerfile and .dockerignore (see more). But it does not work if you have unprefixed Dockerfile. Also, this might be unsupported by some building tools (I’ve definitely seen some problems, but can’t find that issue again).
While using DinD approach prefer secret over build-args.

Looking for a new approach#

Kaniko#

Buildah#

Docker in Docker#

Final setup#

Conclusion#

Bonus notes#