Tips for creating custom workflows in GitLab CI

Note trans. : The original article was written by Miłosz Smółka, one of the founders of the small Polish company Three Dots Labs , which specializes in “advanced backend solutions”. The author draws on his experience of active exploitation of GitLab CI and shares the accumulated tips for other users of this Open Source product. After reading them, we realized how close the problems he described to us were, so we decided to share the proposed solutions with a wider audience.

This time I will cover more advanced topics in GitLab CI. A frequent task here is to implement non-standard features in the pipeline. Most of the tips are specific to GitLab, although some of them can be applied to other CI systems.

Running integration tests

As a rule, checking code using unit tests is easy to connect to any CI system. This is usually no more difficult than running one of the commands built into the standard set of programming language utilities. In such tests, you will most likely use different mocks and plugs to hide implementation details and focus on testing specific logic. For example, you can use an in-memory database as storage or write stubs for HTTP clients that will always return already prepared responses.

However, sooner or later you will need integration tests to cover more unusual situations with tests. I will not go into the discussion about all possible types of testing and just say that by integration, I mean tests that use some kind of external resources. It can be a real database server, an HTTP service, a connected storage, etc.

In GitLab, it is easy to run pluggable resources as Docker containers associated with a container running scripts. These dependencies can be defined using services . They are available by the image name or by the name of your choice, if you specify it in the alias field.

Here is a simple example of using a plugin with MySQL:

 integration_tests: stage: tests services: - name: mysql:8 alias: db script: - ./run_tests.sh db:3306

In this case, the test scripts will need to connect to the db host. Using alias is usually a good idea because it allows you to replace images without the need to modify test code. For example, you can replace the mysql image with mariadb , and the script will still work correctly.

Waiting for containers

Since pluggable containers take a long time to load, you may need to implement a wait before sending any requests. The simple way is the wait-for-it.sh script with a defined timeout.

Using Docker Compose

For most cases, services should be sufficient. However, sometimes you may need to interact with external services. For example, in the case of launching Kafka and ZooKeeper in two separate containers (this is how the official images are assembled). Another example is the launch of tests with a dynamic number of nodes, for example, Selenium. The best solution for running such services would be Docker Compose :

 version: '3' services: zookeeper: image: confluentinc/cp-zookeeper environment: ZOOKEEPER_CLIENT_PORT: 2181 kafka: image: confluentinc/cp-kafka environment: KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092 ports: - 9092:9092

If you are using your installation with GitLab runners on trustworthy servers, you can start the Docker Composer through the Shell executor . Another possible option is the Docker in Docker ( dind ) dind . But in that case, read this article first.

One way to use Compose is to set up an environment, run tests, and then destroy everything. A simple bash script will look like this:

 docker-compose up -d ./run_tests.sh localhost:9092 docker-compose down

As long as you run tests in a minimal environment, everything will be fine. Although there may be a situation in which you need to install some dependencies ... There is another way to run tests in Docker Compose - it allows you to create your Docker image with a test environment. In one of the containers you run the tests and exit with the appropriate return code:

 version: '3' services: zookeeper: image: confluentinc/cp-zookeeper environment: ZOOKEEPER_CLIENT_PORT: 2181 kafka: image: confluentinc/cp-kafka environment: KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092 tests: image: registry.example.com/some-image command: ./run_tests.sh kafka:9092

Notice that we got rid of the need to map ports. In this example, tests can interact with all services directly.

And their launch is carried out by one command:

 docker-compose up --exit-code-from tests

The option --exit-code-from implies --abort-on-container-exit , which means: the whole environment initiated by docker-compose up will be stopped after one of the containers is finished. The completion code of this command will be equivalent to the exit code of the selected service (i.e., these are the tests in the example above). If the command that runs the tests finishes with a non-zero code, then the entire docker-compose up will finish working with it.

Use of labels as CI tags

Warning : this is a rather unusual idea, but it seemed to me very useful and flexible.

As you may know, GitLab has the Labels feature available at the project and group levels. Labels can be installed on tickets and merge requests. However, they have no relationship with the pipelines.

A minor revision will allow access to the merge request labels in job scripts. In GitLab 11.6, everything has become even easier, because The CI_MERGE_REQUEST_IID environment variable appeared (yes, it is with the IID , not the ID ), if the pipeline uses only: merge_requests .

If only: merge_requests not used or you are working with an older version of GitLab, MR can still be obtained by calling the API:

 curl "$CI_API_V4_URL/projects/$CI_PROJECT_ID/repository/commits/$CI_COMMIT_SHA/merge_requests?private_token=$GITLAB_TOKEN"

The field we need is iid . However, remember that for a given commit, many MRs can return.

When the MR IID is received, it remains only to access the Merge Requests API and use the labels field from the answer:

 curl "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID?private_token=$GITLAB_TOKEN"

Authorization

Unfortunately, at the moment it is not possible to use $CI_JOB_TOKEN to access the project API (at least, if the project is not public). If the project has limited access (internal or private), for authorization in the GitLab API you will need to generate a personal API token.

However, this is not the safest solution, so be careful. If the token falls into bad hands, then it may appear to write access to all your projects. One of the ways to reduce risks is to create a separate account with the right only to read the repository and generate a personal token for this account.

How safe are your variables?

Just a few versions ago, the Variables section was called Secret Variables , which sounds as though they were created for the reliable storage of credentials and other critical information. In fact, the variables are simply hidden from users who do not have Maintainer rights. They are not encrypted on the disk, and they can be easily leaked through environment variables in scripts.

Keep this in mind when adding any variables, and consider storing secrets in safer solutions (for example, Vault from HashiCorp ).

Use cases

What to do with the lists of labels - you decide. Here are some ideas:

Use them for segmentation tests.
Use key-value semantics with a colon as a delimiter (for example, labels like tests:auth , tests:user )
Include certain features for job'ov.
Allow debugging of certain jobs if the label exists.

Call external API

Although GitLab comes with a set of features already available, it’s very likely that you will want to use other utilities that can be integrated with pipelines. The simplest method of implementation is, of course, the calls of the good old curl .

If you create your own tools, you can teach them to listen to GitLab Webhooks (see the Integrations tab in the project settings). However, if you are going to use them with some critical systems, make sure that they meet the requirements of high availability.

Example: Grafana annotations

If you are working with Grafana , annotations are a great way to mark events that have occurred over time on charts. They can be added not only manually by clicking on the GUI, but also by invoking the Grafana REST API :

To access the API you will need to generate an API Key. Consider creating a separate user with limited access:

Define two variables in the project settings:

GRAFANA_URL - URL to the Grafana installation (for example, https://grafana.example.com );
GRAFANA_APIKEY - generated API key.

To be able to reuse it, put the script in a repository with common scripts :

 #!/bin/bash set -e if [ $# -lt 2 ]; then echo "Usage: $0 <text> <tag>" exit 1 fi readonly text="$1" readonly tag="$2" readonly time="$(date +%s)000" cat >./payload.json <<EOF { "text": "$text", "tags": ["$tag"], "time": $time, "timeEnd": $time } EOF curl -X POST "$GRAFANA_URL/api/annotations" \ -H "Authorization: Bearer $GRAFANA_APIKEY" \ -H "content-type: application/json" \ -d @./payload.json

Now you can add to the CI configuration its call with the necessary parameters:

 deploy: stage: deploy script: - $SCRIPTS_DIR/deploy.sh production - $SCRIPTS_DIR/grafana-annotation.sh "$VERSION deployed to production" deploy-production

These calls can be placed in the deploy.sh script to simplify the CI configuration.

Bonus: quick tips

GitLab has excellent documentation for all possible keywords that can be used to configure CI. I do not want to duplicate its contents here, but I will point out some useful cases. Click on the headings to familiarize yourself with the documentation on the topic.

Advanced use only / except

By defining templates for CI variables, you can define non-standard assemblies for some branches. This can help, for example, to identify push fixes for urgent fixes, but do not abuse it:

 only: refs: - branches variables: - $CI_COMMIT_REF_NAME =~ /^hotfix/

GitLab has many predefined variables in every CI job — use them.

Yaml anchors

Use them to avoid duplication.

From version 11.3, you can also use the extends keyword :

 .common_before_script: &common_before_script before_script: - ... - ... deploy: <<: *common_before_script

Elimination of artifacts

By default, all artifacts collected in the pipeline will be transferred to all subsequent jobs. If you explicitly list the artifacts on which jobs depend, you can save time and disk space:

 dependencies: - build

Or - vice versa - completely skip everything if none of them are required:

 dependencies: []

Git strategy

Skip repository cloning if job will not use these files:

 variables: GIT_STRATEGY: none

Everything!

Thank you for reading! With feedback and questions, contact me on Twitter or Reddit .

More tips on GitLab can be found in previous publications: