Question. How to work with the database

There is such a conf. File - docker-compose.yml

Correctly, I understand the steps of work

  1. I have a sql file

  2. I copy it inside the container lemp_mariadb

  3. Do I import the base there and everything works? Are any other actions needed?

Then it turns out to change the structure or data, then I always need to go inside the container and change the structure there?

Another moment

Used laravel, there is artisan for migration. So I understand that the issue with the structure will be resolved, but what about data that can be updated?

Situation: Another developer will download the application and he will also need to go into the container and deploy the database?

version: '2' services: nginx: image: evild/alpine-nginx:1.9.15-openssl container_name: lemp_nginx restart: always links: - php volumes: - ./project:/var/www/ - ./docker/nginx/conf/nginx.conf:/etc/nginx/conf/nginx.conf:ro - ./docker/nginx/conf.d:/etc/nginx/conf.d:ro ports: - 8080:80 - 443:443 php: image: evild/alpine-php:7.0.6 working_dir: /var/www container_name: lemp_php restart: always volumes: - ./project:/var/www/ depends_on: - db links: - db environment: - DB_NAME=mysql - DB_USER=root - DB_PASSWORD=password db: image: mariadb:latest container_name: lemp_mariadb restart: always volumes: - db-data:/var/lib/mysql 
  • Еще момент - if you have two questions, it is better to issue them exactly as two separate publications. - Nick Volynkin

2 answers 2

Yes, the rule of thumb - structural changes (migrations) can occur automatically, all data recovery from a dump is done manually - simply because this operation does not belong to the category of normal maintenance operations. Therefore, it is really worth dumping the dump manually into the container (you can simply transfer the database directory to the host in order not to do this every time, or create a separate dev-image with a full database). It is not a good idea to force the container to start at the start of the dump and change it in the course of development.

The big problem is that the database will somehow diverge from the developers, I don’t see - this could theoretically generate missed bugs, but in responsible development it shouldn’t greatly influence the processes.

However, programming more and more enters such a notion as seed - filling the database with initial data, which are guaranteed to be in this database and are de facto part of the application (for example, if there is a multi-regional application that starts from Moscow and St. Petersburg, it makes sense during the initial display of the application to create these two regions in the database, so that the application does not have the opportunity to rise in an empty form). This, in contrast to a full recovery from a dump, is a standard maintenance operation, in which case you can create a separate seed for development, which will contain the minimum set of data needed by the developers, and fully automate this process. In the case of laravel, this seems to be the most optimal strategy, although it will require great efforts to keep this data set up-to-date and, most likely, will require a revision of the identifier system (rejection of auto increment / serial).

  • volumes: - db-data: / var / lib / mysql Tell me why is this a line? I understand that she somehow passes the data either into the container or vice versa - ruslik
  • @ruslik yes, that's right, this is the mount of the host directory inside the container - this way the data will be saved between launches - etki
  • I do not really understand the work of docker, for which data is stored between runs. When I stop the container, the data is deleted? - ruslik
  • I entered the container, made a new entry to the database, stopped the container and restarted - wouldn't there be new data? - ruslik
  • @ruslik at startup - will be, when destroyed - no. Any change to the docker-compose service ends with its complete re-creation, so the probability of losing data is non-zero. - etki

Already quite a few copies are broken on stateful containers. In particular, containers containing a database.

Here it is important to determine what we are trying to achieve. Options are as follows:

  1. Immunity state of the database. There is a runtime DB, its configs (possibly using environment variables at startup), SQL (DDL) and SQL (DML) with data at the time of launch. With an accuracy of the environment variables used at each start, we get the same thing. You can test, you can be sure that this will be the case with all developers. From time to time, you can make a SQL dump and build a new version of the container. Minus: with this approach, we lose the change in the state of the database. Good for development.
  2. Immunity configuration and layout. Rantaym DB, configs and DDL are packed in a container and verified. Data in the form of binary database working files (and transaction logs) is connected via volume. Accordingly, they are not immune. We work with data separately. Back up, provide fault tolerance. There is a danger: only one container at a time can work with one set of data files without the risk of damaging them. Other developers, when launching a container, have to connect their data set to it from somewhere, which will be different from the first one. For development, this is not very good, but otherwise it is impossible to ensure normal backup without losing data between backups.
  3. Unlike the previous option, you can store data in a separate container in the form of SQL (DML) and periodically reassemble it. Connect to runtime database as volume. Again, in development, this allows operating a reproducible database configuration, including data. But at the same time, runtime with configuration and schema and data are versioned separately. For reproducibility of a container with data, it must be collected each time from a SQL (DML) script. It can be convenient: delt sizes are smaller, SQL can be stored in Git, for example. The downside is that this is not always possible. For example, storing a large database dump in Git is pointless: diffs will not say anything.
  4. The option to backup the current state of the container from the sale is not considered as Ops Smell. We get some kind of binary, which is not clear how to get. We do not reproduce, store a lot of excess garbage in the form of, for example logs. Or simply because of the structure of the Docker container file system, which preserves the deltas.

Thus, while the project is not launched in the prod - the first option is more convenient. When a project is already in sale, only Option 2 is usually possible. If SQL (DML) is carefully done by hand (for example, in the case of a database with regulatory reference information), you can also use the conveniences of option 3 in the sale.

Ultimately, the choice is that we make it immutable and versioning, and what we do is liquid and “alive”, but non-versionable.

EDIT: We will understand a bit of terminology.

The image is what the developers exchange with each other, what goes into production; some frozen impression from which the container is launched.

A container is a running process that uses a processor and memory, their state is ephemeral, and life in general is not worth a penny. By running two containers from the same image, we get the same state at the start. But then everyone lives their own lives.

Volume (volume) - a folder from the file system of the host (host), which the container perceives as part of its file system. Accordingly, the volume experiences the birth, death, launching and stopping of containers and stores everything that has changed in it. It is important that the volume state is separated from the state of the container. In addition, working with volume is done directly with the host file system, bypassing the container file system layer. With intensive disk IO, this gives a performance benefit. The contents of the volume in the image is not saved, therefore it is not reproducible (unless it is volume, packed in another image and container).