Unfortunately, using Docker does add some overhead in some scenarios. This guide provides guidance on a number of common problem areas:
- Image build performance
- I/O performance
- Database schema migrations and test data setup
- Shutdown / cleanup time
If your container is using a build directory and Dockerfile rather than a pre-existing image, building this image can sometimes take quite a while.
If using a pre-built image is not an option, there are two things you can try to improve image build times:
When Docker is building an image, it executes each step in the Dockerfile in sequence. When it is safe to do so, Docker can reuse the cached output of a previous image build for the current step. This can save a significant amount of time.
This caching behaviour is an important consideration when ordering commands in a Dockerfile, especially if it will be built many times. If infrequently changing steps appear first, then their output can be cached and only the later steps need to be rebuilt, saving time.
For example, consider this Dockerfile:
my-app changes, Docker will need to run the
COPY step as well as the
RUN apk add... step. However, there's no need for the
steps to run in this order. If we swap the order of these two steps, then when
my-app changes, the only step that needs to be rebuilt is the
BuildKit support in Batect is currently experimental. Some features may not work as expected.
Please open an issue if you encounter any problems.
BuildKit is a new image builder available in recent versions of Docker. It offers significantly improved performance over the legacy image builder as well as some other new features.
You can enable BuildKit by setting the
DOCKER_BUILDKIT environment variable to
1. Alternatively, to enable BuildKit on a once-off basis, run
Batect with the
--enable-buildkit flag, for example:
./batect --enable-buildkit build.
You can further improve image build performance with BuildKit by providing image digests for base images. Providing image digests allows BuildKit
to skip checking if the base image is up-to-date. For example, instead of using
FROM alpine:3.12.1, use
FROM alpine:[email protected]:d7342993700f8cd7aba8496c2d0e57be0666e80b4c441925fc6f9361fa81d10e.
tl;dr: If you're seeing slow build times on macOS or Windows, using Batect's caches as well as volume mount options such
cached might help
Docker requires features only found in the Linux kernel, and so on macOS and Windows, Docker Desktop runs a lightweight Linux virtual machine to host Docker. However, while this works perfectly fine for most situations, there is some overhead involved in operations that need to work across the host / virtual machine boundary, particularly when it comes to mounting files or directories into a container from the host.
While the throughput of mounts on macOS and Windows is generally comparable to native file access within a container, the latency performing I/O operations such as opening a file handle can often be significant. This overhead is introduced because these need to cross from the Linux VM hosting Docker to the host OS and back again.
There are two ways to improve the performance of file I/O when using Batect:
The performance penalty of mounting a file or directory from the host machine does not apply to Docker volumes, as these remain entirely on the Linux VM hosting Docker. This makes them perfect for directories such as caches where persistence between task runs is required, but easy access to their contents is not necessary.
Batect makes this simple to configure. In your container definition, add a mount to
For example, for a typical Node.js application, include the following in your configuration to cache the
node_modules directory in a volume:
Batect uses a cache initialisation container to prepare volumes for use as caches. This process ensures that volumes used as caches are readable by containers running with run as current user mode enabled, and that new caches are empty the first time they are used.
To make it easier to share caches between builds on ephemeral CI agents, you can instruct Batect to use directories instead of volumes, and then use these
directories as a starting point for subsequent builds. Run Batect with
--cache-type=directory to enable this behaviour, then save and restore the
.batect/caches directory between builds.
This is only recommended on Linux CI agents, as using mounted directories instead of volumes has no performance impact on Linux.
The performance penalty described above does not apply when mounting directories into Windows containers. Batect therefore always uses directory mounts for
caches on Windows containers, even if
--cache-type=volume is specified on the command line.
This section only applies to macOS-based hosts, and is only supported by Docker version 17.04 and higher.
cached mode is harmless for other host operating systems.
For situations where a cache is not appropriate (eg. mounting your code from the host into a build environment), specifying the
cached volume mount option
can result in significant performance improvements.
Before you use this option in another context, you should consult the documentation to understand the implications of it.
For example, instead of defining your container like this:
Setting this option will not affect Linux or Windows hosts, so it's safe to commit and share this in a project where some developers use macOS and others use Linux or Windows.
tl;dr: Try to do as much work as possible at image build time, rather than doing it every time the container starts
A significant amount of time during integration or journey testing with a database can be taken up by preparing the database for use. Setting up the database schema (usually with some kind of migrations system) and adding the initial test data can take quite some time, especially as the application evolves over time.
One way to address this is to bake the schema and test data into the Docker image used for the database, so that this setup cost only has to be paid when building the image or when the setup changes, rather than on every test run. The exact method for doing this will vary depending on the database system you're using, but the general steps that would go in your Dockerfile are:
- Copy schema and test data scripts into container
- Temporarily start database daemon
- Run schema and data scripts against database instance
- Shut down database daemon
tl;dr: Make sure signals such as
SIGKILL are being passed to the main process
If you notice that post-task cleanup for a container is taking longer than expected, and that container starts the main process from a
shell script, make sure that signals such as
SIGKILL are being forwarded to the process.
Otherwise, if these signals are not forwarded correctly, Docker will wait 10 seconds for the application to respond to the signal before timing out and terminating the process.
For example, instead of using: