Already quite a few copies are broken on stateful containers. In particular, containers containing a database.
Here it is important to determine what we are trying to achieve. Options are as follows:
- Immunity state of the database. There is a runtime DB, its configs (possibly using environment variables at startup), SQL (DDL) and SQL (DML) with data at the time of launch. With an accuracy of the environment variables used at each start, we get the same thing. You can test, you can be sure that this will be the case with all developers. From time to time, you can make a SQL dump and build a new version of the container. Minus: with this approach, we lose the change in the state of the database. Good for development.
- Immunity configuration and layout. Rantaym DB, configs and DDL are packed in a container and verified. Data in the form of binary database working files (and transaction logs) is connected via volume. Accordingly, they are not immune. We work with data separately. Back up, provide fault tolerance. There is a danger: only one container at a time can work with one set of data files without the risk of damaging them. Other developers, when launching a container, have to connect their data set to it from somewhere, which will be different from the first one. For development, this is not very good, but otherwise it is impossible to ensure normal backup without losing data between backups.
- Unlike the previous option, you can store data in a separate container in the form of SQL (DML) and periodically reassemble it. Connect to runtime database as volume. Again, in development, this allows operating a reproducible database configuration, including data. But at the same time, runtime with configuration and schema and data are versioned separately. For reproducibility of a container with data, it must be collected each time from a SQL (DML) script. It can be convenient: delt sizes are smaller, SQL can be stored in Git, for example. The downside is that this is not always possible. For example, storing a large database dump in Git is pointless: diffs will not say anything.
- The option to backup the current state of the container from the sale is not considered as Ops Smell. We get some kind of binary, which is not clear how to get. We do not reproduce, store a lot of excess garbage in the form of, for example logs. Or simply because of the structure of the Docker container file system, which preserves the deltas.
Thus, while the project is not launched in the prod - the first option is more convenient. When a project is already in sale, only Option 2 is usually possible. If SQL (DML) is carefully done by hand (for example, in the case of a database with regulatory reference information), you can also use the conveniences of option 3 in the sale.
Ultimately, the choice is that we make it immutable and versioning, and what we do is liquid and “alive”, but non-versionable.
EDIT: We will understand a bit of terminology.
The image is what the developers exchange with each other, what goes into production; some frozen impression from which the container is launched.
A container is a running process that uses a processor and memory, their state is ephemeral, and life in general is not worth a penny. By running two containers from the same image, we get the same state at the start. But then everyone lives their own lives.
Volume (volume) - a folder from the file system of the host (host), which the container perceives as part of its file system. Accordingly, the volume experiences the birth, death, launching and stopping of containers and stores everything that has changed in it. It is important that the volume state is separated from the state of the container. In addition, working with volume is done directly with the host file system, bypassing the container file system layer. With intensive disk IO, this gives a performance benefit. The contents of the volume in the image is not saved, therefore it is not reproducible (unless it is volume, packed in another image and container).
Еще момент- if you have two questions, it is better to issue them exactly as two separate publications. - Nick Volynkin ♦