COPY --link is a new BuildKit feature which could substantially accelerate your Docker image builds. It works by copying files into independent image layers that don’t rely on the presence of their predecessors. You can add new content to images without the base image even existing on your system.
In this article, we’ll show what
--link does and explain how it works. We’ll also look at some of the situations in which it shouldn’t be used.
What Is “–link”?
--link is a new optional argument for the existing Dockerfile
COPY instruction. It changes the way copies work by creating a new snapshot layer each time you use it.
COPY statements add files to the layer that precedes them in the Dockerfile. The contents of that layer need to exist on your disk so the new content can be merged in:
FROM alpine COPY my-file /my-file COPY another-file /another-file
The Dockerfile above copies
my-file into the layer produced by the previous command. After the
FROM instruction, the image consists of Alpine’s content:
bin/ dev/ etc/ ...
COPY instruction produces an image that includes everything from Alpine, as well as the
my-file bin/ dev/ etc/ ...
And the second
COPY instruction adds
another-file on top of this image:
another-file my-file bin/ dev/ etc/ ...
The layer produced by each instruction includes everything that came before it, as well as anything newly added. At the end of the build, Docker uses a diffing process to work out the changes within each layer. The final image blob contains just the files that were added in each snapshot stage but this isn’t reflected in the assembly process during the build.
COPY to create a new standalone filesystem each time it’s used. Instead of copying the new files on top of the previous layer, they’re sent to a completely different location to become an independent layer. Layers are subsequently linked together to produce the final image.
Let’s change the example Dockerfile to use
FROM alpine COPY --link my-file /my-file COPY --link another-file /another-file
The result of the
FROM instruction is unchanged – it yields the Alpine layer, with all that image’s content:
bin/ dev/ etc/ ...
COPY instruction has a noticeably different effect. This time another independent layer is created. It’s a new filesystem containing only
Then the second
COPY instruction creates another new snapshot with only
When the build completes, Docker stores these independent snapshots as new layer archives (tarballs). The tarballs are linked back into the chain of preceding layers, building up the final image. This consists of all three snapshots merged together, resulting in a filesystem that matches the original one when containers are created:
my-file another-file bin/ dev/ etc/ ...
This image from the BuildKit project illustrates the differences between the two approaches.
Adding “COPY –link” to Your Builds
COPY --link is only available when you’re using BuildKit to build your images. Either run your build with
docker buildx --create or use
docker build with the
DOCKER_BUILDKIT=1 environment variable set.
You must also opt-in to the Dockerfile v1.4 syntax using a comment at the top of your file:
# syntax=docker/dockerfile:1.4 FROM alpine:latest COPY --link my-file /my-file COPY --link another-file /another-file
Now you can build your image with support for linked copies:
DOCKER_BUILDKIT=1 docker build -t my-image:latest .
Images built from Dockerfiles using
COPY --link can be used like any other. You can start a container with
docker run and push them straight to registries. The
--link flag only affects how content is added to the image layers during the build.
Why Linked Copies Matter
--link flag allow build caches to be reused even when content you
COPY in changes. In addition, builds may be able to complete without their base image even existing on your machine.
Returning to the example from above, standard
COPY behavior requires the
alpine image to exist on your Docker host before the new content can be added. The image will be downloaded automatically during the build if you’ve not previously pulled it.
With linked copies, Docker doesn’t need the
alpine image’s content. It pulls the
alpine manifest, creates new independent layers for the copied files, then creates a revised manifest that links the layers into those provided by
alpine. The content of the
alpine image – its layer blobs – will only be downloaded if you start a container from your new image or export it to a tar archive. When you push the image to a registry, that registry will store its new layers and remotely acquire the
This functionality facilitates efficient image rebases too. Perhaps you’re currently distributing a Docker image using the latest Ubuntu 20.04 LTS release:
FROM golang AS build ... RUN go build -o /app . FROM ubuntu:20.04 COPY --link --from=build /app /bin/app ENTRYPOINT ["/bin/app"]
You can build the image with caching enabled using BuildKit’s
--cache-to flag. The
inline cache stores build cache data inside the output image, where it can be reused in subsequent builds:
docker buildx build --cache-to type=inline -t example-image:20.04 .
Now let’s say you’d like to provide an image that’s based on the next LTS after its release, Ubuntu 22.04:
FROM golang AS build ... RUN go build -o /app . FROM ubuntu:22.04 COPY --link --from=build /app /bin/app ENTRYPOINT ["/bin/app"]
Rebuild the image using the cache data embedded in the original version:
docker buildx build --cache-from example-image:20.04 -t example-image:22.04 .
The build will complete almost instantly. Using the cached data from the existing image, Docker can verify the files needed to build
/app haven’t changed. This means the cache for the independent layer created by the
COPY instruction remains valid. As this layer doesn’t depend on any other, the
ubuntu:22.04 image won’t be pulled either. Docker merely links the snapshot layer containing
/bin/app into a new manifest within the
ubuntu:22.04 layer chain. The snapshot layer is effectively “rebased” onto a new parent image, without any filesystem operations occurring.
The model also optimizes multi-stage builds where changes can occur between any of the stages:
FROM golang AS build RUN go build -o /app . FROM config-builder AS config RUN generate-config --out /config.yaml FROM ubuntu:latest COPY --link --from=config /config.yaml build.conf COPY --link --from=build /app /bin/app
--link, any change to the generated
ubuntu:latest to be pulled and the file to be copied in. The binary then has to be recompiled as its cache is invalidated by the filesystem changes. With linked copies, a change to
config.yaml allows the build to continue without pulling
ubuntu:latest or recompiling the binary. The snapshot layer with
build.conf inside is simply replaced by a new version that’s independent of all the other layers.
When Not To Use It
There are some situations where the
--link flag won’t work correctly. Because it copies files into a new layer, instead of adding them on top of the previous one, you can’t use ambiguous references as your destination path:
COPY --link my-file /data
With a regular
my-file will be copied to
/data already exists as a directory in the image. With
--link, the target layer’s filesystem will always be empty, so
my-file gets written to
The same consideration applies to symlink resolution. Standard
COPY automatically resolves destination paths that are symlinks in the image. When you’re using
--link, this behavior isn’t supported as the symlink won’t exist in the copy’s independent layer.
It’s recommended you start using
--link wherever these limitations don’t apply. Adopting this feature will speed up your builds and make caching more powerful. If you can’t immediately remove ambiguous or symlinked destination paths, you can keep using the existing
COPY instruction. It’s due to these backwards incompatible changes that
--link is an optional flag, instead of the new default.
COPY --link is a new Dockerfile feature which can make builds quicker and more efficient. Images using linked copies don’t need to pull previous layers just so files can be copied into them. Docker creates a new independent layer for each
COPY instead, then links those layers back into the chain.
You can start using linked copies now if you’re building images with BuildKit and the latest version of the Buildx or Docker CLI. Adopting “–link” is a new best practice Docker build step, provided you’re not affected by the changes to destination path resolution that it necessitates.
#Accelerate #Docker #Builds #Optimize #Caching #COPY #link