Skip to main content

Improve performance

Unfortunately, using Docker does add some overhead in some scenarios. This guide provides guidance on a number of common problem areas:

Image build performance

If your container is using a build directory and Dockerfile rather than a pre-existing image, building this image can sometimes take quite a while.

If using a pre-built image is not an option, there are two things you can try to improve image build times:

Order build steps to take advantage of Docker's image build cache

When Docker is building an image, it executes each step in the Dockerfile in sequence. When it is safe to do so, Docker can reuse the cached output of a previous image build for the current step. This can save a significant amount of time.

This caching behaviour is an important consideration when ordering commands in a Dockerfile, especially if it will be built many times. If infrequently changing steps appear first, then their output can be cached and only the later steps need to be rebuilt, saving time.

For example, consider this Dockerfile:

Dockerfile
FROM alpine:3.12.1
RUN mkdir -p /app
COPY my-app /app/my-app
RUN apk add --no-cache ruby

Every time my-app changes, Docker will need to run the COPY step as well as the RUN apk add... step. However, there's no need for the steps to run in this order. If we swap the order of these two steps, then when my-app changes, the only step that needs to be rebuilt is the COPY step:

Dockerfile
FROM alpine:3.12.1
RUN mkdir -p /app
RUN apk add --no-cache ruby
COPY my-app /app/my-app

Use BuildKit

caution

BuildKit support in Batect is currently experimental. Some features may not work as expected.

Please open an issue if you encounter any problems.

BuildKit is a new image builder available in recent versions of Docker. It offers significantly improved performance over the legacy image builder as well as some other new features.

You can enable BuildKit by setting the DOCKER_BUILDKIT environment variable to 1. Alternatively, to enable BuildKit on a once-off basis, run Batect with the --enable-buildkit flag, for example: ./batect --enable-buildkit build.

You can further improve image build performance with BuildKit by providing image digests for base images. Providing image digests allows BuildKit to skip checking if the base image is up-to-date. For example, instead of using FROM alpine:3.12.1, use FROM alpine:3.12.1@sha256:d7342993700f8cd7aba8496c2d0e57be0666e80b4c441925fc6f9361fa81d10e.

I/O performance

tip

tl;dr: If you're seeing slow build times on macOS or Windows, using Batect's caches might help

Docker requires features only found in the Linux kernel, and so on macOS and Windows, Docker Desktop runs a lightweight Linux virtual machine to host Docker. However, while this works perfectly fine for most situations, there is some overhead involved in operations that need to work across the host / virtual machine boundary, particularly when it comes to mounting files or directories into a container from the host.

While the throughput of mounts on macOS and Windows is generally comparable to native file access within a container, the latency performing I/O operations such as opening a file handle can often be significant. This overhead is introduced because these need to cross from the Linux VM hosting Docker to the host OS and back again.

This increased latency quickly accumulates, especially when many file operations are involved. This particularly affects languages such as JavaScript and Golang that encourage distributing all dependencies as source code and breaking codebases into many small files: even a warm build with no source code changes still requires the compiler to examine each dependency file to ensure that the cached build result is up-to-date.

The primary way to improve the performance of file I/O when using Batect is through using a Batect cache backed by a Docker volume wherever possible.

info

Previously, using the cached mount option was recommended for non-volume mounts. This option only ever applied to macOS hosts and was silently ignored on other host operating systems.

With the introduction of the gRPC file sharing backend in more recent versions of Docker Desktop for macOS, the cached mount option has been deprecated and removed. Continuing to use it in your projects is harmless, but it no longer alters Docker's behaviour in any way and can be removed from your projects.

Cache volumes

The performance penalty of mounting a file or directory from the host machine does not apply to Docker volumes, as these remain entirely on the Linux VM hosting Docker. This makes them perfect for directories such as caches where persistence between task runs is required, but easy access to their contents is not necessary.

Batect makes this simple to configure. In your container definition, add a mount to volumes with type: cache.

For example, for a typical Node.js application, include the following in your configuration to cache the node_modules directory in a volume:

batect.yml
containers:
build-env:
image: "node:13.8.0"
volumes:
- local: .
container: /code
- type: cache
name: app-node-modules
container: /code/node_modules
working_directory: /code
tip

To make it easier to share caches between builds on ephemeral CI agents, you can instruct Batect to use directories instead of volumes, and then use these directories as a starting point for subsequent builds. Run Batect with --cache-type=directory to enable this behaviour, then save and restore the .batect/caches directory between builds.

This is only recommended on Linux CI agents, as using mounted directories instead of volumes has no performance impact on Linux.

caution

If you mount a cache over an existing directory in the container's image and are not using directory mounts for caches with --cache-type, the first time the cache is created, the cache inherits the contents of the directory from the image.

Windows containers

The performance penalty described above does not apply when mounting directories into Windows containers. Batect therefore always uses directory mounts for caches on Windows containers, even if --cache-type=volume is specified on the command line.

Database schema migrations and test data setup

tip

tl;dr: Try to do as much work as possible at image build time, rather than doing it every time the container starts

A significant amount of time during integration or journey testing with a database can be taken up by preparing the database for use. Setting up the database schema (usually with some kind of migrations system) and adding the initial test data can take quite some time, especially as the application evolves over time.

One way to address this is to bake the schema and test data into the Docker image used for the database, so that this setup cost only has to be paid when building the image or when the setup changes, rather than on every test run. The exact method for doing this will vary depending on the database system you're using, but the general steps that would go in your Dockerfile are:

  1. Copy schema and test data scripts into container
  2. Temporarily start database daemon
  3. Run schema and data scripts against database instance
  4. Shut down database daemon

Shutdown / cleanup time

tip

tl;dr: Make sure signals such as SIGTERM and SIGKILL are being passed to the main process

If you notice that post-task cleanup for a container is taking longer than expected, and that container starts the main process from a shell script, make sure that signals such as SIGTERM and SIGKILL are being forwarded to the process.

Otherwise, if these signals are not forwarded correctly, Docker will wait 10 seconds for the application to respond to the signal before timing out and terminating the process.

For example, instead of using:

#! /usr/bin/env bash

/app/my-really-cool-app --do-stuff

use this:

#! /usr/bin/env bash

exec /app/my-really-cool-app --do-stuff

Subscribe to the Batect newsletter

Get news and announcements direct to your inbox.