[SOLVED] Dockerfile: why ADD and RUN curl intermittently result in different image sizes?

Issue

This Content is from Stack Overflow. Question asked by John Doe

I’ve been recently refactoring a Dockerfile and decided to try ADD over RUN curl to make the file cleaner. To my surprise, this resulted in quite a size difference:

$ docker images | grep test
test    curl    3aa809928665   7 minutes ago    746MB
test    add     da152355bb4d   3 minutes ago    941MB

Even more surprisingly, I tried a few Dockerfiles that do nothing except ADDing or curling things, and their sizes are identical. I also tried with and without buildkit, the result is the same (although without buildkit images are slightly smaller).

Here’s the actual Dockerfile on Pastebin. I don’t understand why this happens with this particular Dockerfile, because essentially I’m doing exactly the same things.

Any ideas?



Solution

You notice this, because ADDed files do not disappear from older image layers even if you remove them later. Consider the following dockerfiles:

# a
FROM alpine:latest
RUN apk add --no-cache curl

ADD https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz Python.tar.xz
RUN rm Python.tar.xz

# b
FROM alpine:latest
RUN apk add --no-cache curl

RUN curl -o Python.tar.xz https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz 
RUN rm Python.tar.xz

# c
FROM alpine:latest
RUN apk add --no-cache curl

RUN curl -o Python.tar.xz https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz && \
    rm Python.tar.xz

Building each of them in the same context, I got the following results:

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
<none>       <none>    cc79832a5ffa   9 seconds ago    27.3MB
<none>       <none>    87ea16448764   13 seconds ago   7.68MB
<none>       <none>    7f794f03b960   18 seconds ago   27.3MB
alpine       latest    9c6f07244728   5 weeks ago      5.54MB

(guess which file yields different result)

If at some point you "finished" a layer with some files you don’t need in final image – you wasted the space. So your single RUN command is the most efficient. To improve readability, you may try to adapt multi-stage build here, so that all curl/ADD, unzip/tar -x commands are isolated on build stage, and then you have only required binaries to copy from build stage to deploy stage. I’m not sure however that you’ll gain much here.


This Question was asked in StackOverflow by John Doe and Answered by SUTerliakov It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?