Its video platform - ffmpeg and video encoding quality. Part 2

_{Lenna loves to look good - a fashion model after all.} _{There are legends that adding it to the title of the article related to the processing of visual data gives +5 to the chance for advantages.}

I continue to reveal the features of the video services. Today notes about coding parameters and their selection.

First part

Most codecs offer fairly balanced default values, allowing you to get a normal result without a long selection of parameters. However, when it comes to a large archive of video materials, restrictions on bitrate, considerations of compatibility with the client’s equipment and a reasonable desire to preserve the quality of the original, everything becomes more interesting.

Unfortunately, the magic button "to encode quite well" is not provided. Like analogue caniuse for coding parameters. We'll have to understand the features of the work of codecs.

Introduction: Profiles

H264 has such a number of settings and parameters that the developers, in order not to get confused in them, decided to make a list of profiles - “good” configurations for different purposes. Standard profiles have defined a lot; Additionally, by setting your own encoding parameters, you actually create your own profile, completely confusing everyone. So, unfortunately, it turned out as always.

Initially, profiles were created to determine whether the final video will play on the desired type of devices, but now there is no definite division of players by device type and profile.

In practice, I would single out, according to the level of decoding consumption, three groups of parameters:

with disabled CABAC; conditionally main- and baseline-profiles. They can still be used for delay-sensitive streaming broadcasting;
CABAC enabled; conditionally high-profile. For everything. Most of the modern (and not very) equipment is able to lose. Increase efficiency compared to main - 20% +;
with support for ten-bit sampling and other advanced parameters. Conventionally Hi10P. The problem with such profiles is the almost complete lack of hardware support and increased requirements for decoding equipment; phones, even top, can not cope with such files. Can be used for personal library if you are confident in your equipment. Another 10-20% increase in efficiency.

The concept of profiles for other codecs is not as developed as that of H264. For them, we can assume that if the codec is supported, then it is supported entirely, and only an excessively high bit rate, or another parameter that is clearly too high, can be a limitation during playback. However, with the proliferation of hardware decoders VP8 and VP9, the situation may change.

Now to the individual parameters.

Color space

The choice of color space has virtually no effect on coding efficiency; This parameter could be left to the choice of the codec (it is important when processing raw, unencrypted data), if it were not for one particular feature: many players process color space information very specifically, so for a large part of users, the video can be displayed with color distortions (in mostly green).

To preserve colors for most players, different H264 videos need to be encoded in different spaces:

for SD (width <1280) - BT.601
for HD (width> = 1280) - BT.709

There is an excellent study from 2012. on this topic. Unfortunately, the situation with such bugs is changing very slowly, and, although some of the test results from that article are no longer relevant, such features still need to be taken into account. There is a chance that all this time you have watched videos with the wrong colors - and it turns out that this was not a director's decision.
The problem is known for H264 decoders, other formats may not have this problem.

Frame rate

If your source is not streaming games or action video, then it makes sense to limit the upper frame rate value to 25-30 frames - the smaller they are, the more data remains to describe a single frame. Reducing this value is better than a multiple - so that frame dropping is uniform, otherwise the video may experience a slowdown.

There is still such a thing as a variable frame rate. It is inconvenient to work with VFR for two reasons: firstly, it gives bitrate peaks in high frequency sections that instantly empty the buffer; secondly, VFR complicates the compilation of a conversion plan, forcing to use Q-parameters (I wrote about them in the first article).

GOP size

Groups of images are blocks within which some images can refer to the data of others. Increasing the size of the GOP improves the efficiency of the codec in exchange for higher memory requirements. Large values are especially effective for files with the same type, cyclic movements (you know what I mean). Also, for large values, there may be problems with video rewind, since will need to recover more data.
The name of the parameter, as well as the units of measurement, may differ from codec to codec - see the documentation.

Slices

To speed up decoding (and encoding) video can be divided into parts of lower resolution. The idea is that processing four videos with a resolution of, for example, 1280x720 is simpler than one, but 2560x1440. It makes sense at resolutions higher than FHD. The more parts, the lower the efficiency of the codec. Also, the use of this separation simplifies multi-thread processing.

Anamorphic pixels

Rectangular pixels appear when the aspect ratio and the ratio of pixel width to height are different - widescreen DVDs, where 16: 9 video has a resolution of 704 × 480 (3: 2 with analog VAT and wind correction). Playing such videos will not cause problems, however, when encoding, you need to take into account both the resolution and the aspect ratio, otherwise it is easy to convert anamorphic or standard square pixels with a loss of efficiency (up to ~ 35%!), Or even get something flattened horizontally.

Bit rate control

There are three main modes of operation for codecs related to bitrate:

constant bit rate, CBR, when the quality drops in proportion to the complexity of the scene;
constant quality, const Q VBR, when the bit rate increases in proportion to the complexity of the scene;
Limited bitrate and quality - classic VBR.

It should be noted that most coders (including ffmpeg) do not convert codecs to CBR mode when specifying a bitrate — VBR files are made with restrictions not always defined in documentation (CBR mode is usually enabled by specifying the same minrate and maxrate).

VBR is well suited for online playback (and streaming as well). it gives better quality than CBR and allows you to fit the stream in the Internet channel.

The choice of maxrate / minrate depends on the client channel, the spread of more than 20% is better not to do.

Multipass coding

The distribution of data on a file in VBR mode is difficult to predict, codecs have to guess what is not always possible. In multipass mode, the codec first maps the required bit rate, and then encodes it. In this way, the quality of video in complex and dynamic scenes is improved (for example, pay attention to the number of “moiré” elements and the number of transitions between scenes). Since during the first pass the codec only analyzes the source file, contrary to popular opinion, processing in this mode takes no more than twice the time, but only 10-15%.

-tune

For different types of source material, several presets have been prepared, adjusting some basic coding parameters such as deblocking-filiter levels, psycho-visual optimization parameters. Using these presets improves video perception and works well if you know the source type in advance, or you have a structured set of videos (in the case of mass processing).

Presets:

film - for movies and everything with a complex frame structure. This is definitely a film;
animation - for videos with large monochromatic areas. That is, it is better to encode with the animation preset, and this is the film, despite the fact that the animation is;
stillimage - for video where there is almost no movement; good optimization for those songs in mp4 format, where throughout the video the background is the album cover (someone tell them that even flac cannot weigh 300MB for 10 minutes!);
grain - to encode "noisy" sources, such as surveillance cameras;
psnr / ssim - to assess the effectiveness of the remaining parameters of the codec;
fastdecode - forced main-profile for weak devices;
zerolatency - as the name implies, for streaming with low latency.

Pixel format

The format and bit depth strongly influence how files are compressed and uncompressed, in what form quality is lost. The main parameters that describes the pixel format:

method of decomposing colors into components - YUV, RGB;
color subsampling parameters (oh how! chroma subsampling is more familiar) when some color components are saved with lower resolution;
the depth of the color components in bits.

Conscious choice of pixel format requires a separate analysis, collection of material and strongly depends on the type of source material.

Briefly:

not all codecs (and, most importantly, decoders) support possible formats;
working with some formats is more demanding of resources - Hi10P differs from just a high-profile by this;
Working with sub-sampled formats can give a noticeable increase in compression efficiency, but it is more difficult to control quality loss.

Interlaced

Interlacing invented to double the perceived frame rate with minimal cost - the bitrate and resolution are the same, and the frequency is higher. However, with fast movement, the teeth become noticeable - the lines of the previous frame. You can get rid of the effect without dropping frames and without reducing the vertical resolution, you can use filters, but they will reduce the clarity. If the video will play in the browser, it is better to filter the interlacing when encoding, because Realtime filtering on the client will not give the best visual results.

Putting it all together

Example for x264:

ffmpeg -i [источник] -c:v libx264 -b:v [bitrate] #целевой битрейт -maxrate [bitrate] #настраиваем девиацию битрейта -r [framerate] -g [size] #GOP в кадрах -aspect [соотношение, например 16:9] #если исходник анаморфный -profile high #самый простой способ включить CABAC -color_primaries bt709 #отдельно задаём цветовое пространство, не полагаясь на кодек -color_trc bt709 -colorspace bt709 -slices 4 #кодируем отдельными блоками низкого разрешения -threads 4 -tune [value] -map_metadata:g -1 #очищаем метаданные, онлайн они нам не нужны -map_metadata:s:v -1 -map_metadata:s:a -1 -map_chapters -1 -pass [1|2] #при многопроходном кодировании -passlogfile [file] #если обрабатываете файлы параллельно #-map ... -a:c ... -ac ... -a:b ..., фильтры, разрешения - по вкусу [назначение]

Of course, in one article it didn’t cover everything, but I’m sure this material will be enough to improve the quality of many videos.

Read the documentation and experiment.

Materials:

ffmpeg.org/ffmpeg-all.html
en.wikipedia.org/wiki/H.264/MPEG-4_AVC#Profiles
en.wikipedia.org/wiki/Chroma_subsampling
en.wikipedia.org/wiki/Color_space
en.wikipedia.org/wiki/YUV

In addition to the example from the previous article, I learned about another installation of my code - click . Examples in the article I tried to take from these sites, but despite this:
_{* I have no direct relation to the authors of the sites mentioned and can not share their views and opinions.} _{Decisions about who and how to access the code, I can not comment.}

Ready to answer questions.

Source: https://habr.com/ru/post/437936/