Docker dependencies and disk space

Question

Do I understand correctly that each image is dragging its own OS in dependencies (debian of all versions, ubunt of all versions, alpin'y of all versions, what else is there), which is why a lot of different images will quickly eat free space on disk?

Accepted Answer · 2017-08-15T13:39:25

Yes, you get the images of all (almost) OS From the dependencies of all your images.
No, it will not eat your entire drive in one moment.

First, Docker tries to avoid data duplication with storage drivers .

Most Docker images are based on some other images (the FROM directive in the Dockerfile). And as a rule, files that are inherited from one image by others without changes will be placed on the disk only once. How exactly and how work with such files will be organized depends on the driver. On the driver page you can read about the technical details of each of them, for example:

Aufs represents the file system of the container as a merge of layers, starting with the base (read only), through intermediate (read only) and ending with its own layer of the container (available for writing).
ZFS places the underlying file system directly on the storage device, and each successive layer (including its own container layer) is superimposed as a clone from the snapshot of the previous layer.
File history is stored only in one version for the layer. That is, if the file was modified twice during layer assembly, only the latest version will be saved. If it was created and deleted, then there will be no mention of it at all in the layer. Deleting files from the original image is implemented by "whiteout" files with special entries.

Secondly, in OS images for Docker, compared to server OS (to immediately exclude from consideration GUI and related things), many things are not as superfluous:

Init-process and its accessories. Docker immediately starts the process with a "payload", it happens that through a shell that occupies PID 1.
Own kernel and utilities for its maintenance. The host core is still used. Therefore, containers are so tied to the OS on which they are built.
Demons. It is considered good practice to keep exactly one process in each container, so files associated with daemons (mail agent, databases, auto-update systems, etc.) can be thrown away. To whom it will be necessary - will deliver independently, but it is rarely necessary.
Aging files, like package lists from repositories. They will still need to be replaced with fresh versions when inheriting an image, therefore it is useless to include them in the image. Popular problem when building containers: packages are not installed unless you explicitly require updating the list of packages from the repositories in the docfile.

Quite often Alpine Linux is used in containers, the container version of which takes up a little more than 4 megabytes (but when inheriting it adds a little weight due to temporary files). For comparison, its independent installation image for full-fledged systems takes a little more than 100 megabytes.

Other container distributions will be thicker, but they are much more compact compared to their full-bodied counterparts.

Answer 2 · 2017-08-15T11:16:09

Yeah, right. Against this is the deduplication of images and the use of more compact basic images like alpine. Plus, they try to discard everything superfluous from the images: for example, the kernel and utilities related to its maintenance, because the host kernel is used as a result. So the disk space is eaten not very quickly, but certainly faster than when manually setting up products from images without a docker on one system.

Docker dependencies and disk space

2 answers 2

More articles: