How does the VLA work?

Question

int N; ... int arr[N];

What is the peculiarity of this implementation (C99)?

Why in C ++ (although g ++, clang supports) it does not work?

You have a few in no way on related very common questions in one.
@AnT, I was embarrassed that the array is on the stack, but the dynamic size

Accepted Answer · 2019-01-05T19:57:18

For some strange reason, when it comes to VLA, it’s not the foreground that most often argue about the possibility of creating arrays of unknown size in local memory, i.e. on the stack. This is puzzling, because in reality the possibility of local declaration of such arrays is a completely secondary and secondary property of the VLA, which does not play any significant role in the functionality of the VLA. As a rule, protrusion of this possibility and hidden underwater stones is done by blind critics of the VLA in order to divert the discussion from the essence of the issue.

And the crux of the matter is that VLA support is first and foremost a powerful extension of the language typing system . This is an introduction to the language C of such a fundamentally new conceptual group of types as variably modified types (for the purposes of this presentation I will translate as “variable types”). All the most important internal implementation details of a VLA are tied to its type, and not to the object itself. It is the introduction to the language of variable types that is the notorious VLA iceberg, and the ability to create objects of such types in local memory is nothing more than a minor (and optional) tip of this iceberg.

For example, whenever a program declares a type

 /* Внутри блока */ int n = 10; ... typedef int A[n];

the characteristics of this variable type — the value of n — is fixed at the moment when control passes over the given typedef declaration. Changing the value of n after the declaration of the alias A no longer affect the characteristics of type A From the point of view of implementation, this means that a hidden internal variable will be associated with variable type A , describing the size of the array. This hidden variable is initialized at the time of passing a type declaration control.

This gives this typedef declaration an unusual interesting property - it does not just generate executable code, it generates critically necessary executable code. For this reason, an unprecedented property appears in the C language (familiar to us from C ++): the C language prohibits the transfer of control from outside the scope of a variable type entity into this scope.

 /* Внутри блока */ int n = 10; goto skip; /* ОШИБКА: недопустимая передача управления */ typedef int A[n]; skip:;

I emphasize once again that in the above code there is no definition of a single VLA array, but only a typedef-alias declaration for a variable type. However, the transfer of control through such a typedef declaration is not allowed.

When determining the actual VLA array, in addition to the actual allocation of memory for the elements of the array, exactly the same hidden variables are created that store the dimensions of the array. The array itself is implemented through a regular pointer, and memory allocation is done by a mechanism like alloca

 /* Внутри блока */ int n = 10, m = 20; typedef int A[n][m]; A a; a[5][6] = 42; /* ... транслируется в ... */ int n = 10, m = 20; size_t _internal_A1 = n, _internal_A2 = m; int *a = alloca(_internal_A1 * _internal_A2 * sizeof(int)); a[5 * _internal_A2 + 6] = 42;

(Of course, in addition to this, a code will be generated to free the allocated memory at the completion of the block in which array a defined).

However, in this case it should be understood that these hidden variables are associated not so much with the array itself, as with its variable type . If the code declares several VLA arrays and / or variable types with identical runtime characteristics, they can in principle use the same hidden variables to store their sizes.

From this follows one important remarkable consequence: the additional information about the size of the array associated with the VLA array is not built into the object representation of the array itself, but is stored “side by side”: in a completely separate, independent way. This leads to the fact that the object representation of the VLA with any number of dimensions is fully compatible with the object representation of the classic "fixed" array with the same number of dimensions and the same dimensions. for example

 /* Внутри блока */ unsigned n = 5; int a[n][n + 1][n + 2]; /* VLA */ int (*p)[5][6][7]; /* Указатель на "классический" массив */ p = &a; /* Присваивание корректно, т.к. размеры массивов совпадают */ (*p)[1][2][3] = 42; /* Поведение определено: `a[1][2][3]` получает значение 42 */

Or by the example of the frequent need to transfer an array to a function

 void foo(unsigned n, unsigned m, unsigned k, int a[n][m][k]) {} void bar(int a[5][5][5]) {} int main(void) { unsigned n = 5; int vla_a[n][n][n]; bar(a); int classic_a[5][6][7]; foo(5, 6, 7, classic_a); }

Both function calls in the above code are perfectly correct and their behavior is completely determined by the language, despite the fact that we pass the VLA to where a classical array is required and vice versa. Of course, the compiler in such a situation will not control the correctness of the calls, i.e. the coincidence of the actual sizes of parameters and arguments (although, if desired, both the user and the user can generate a verification code in the debug mode).

(Note: As usual, the parameters of the array type, regardless of whether they are VLA or not, are always implicitly transformed into parameters of the pointer type, which means that in the above example, the parameter a actually has the type int (*)[m][k] and the value of n does not affect this type. I specifically added more dimensions to the array in order not to lose its variability.)

Compatibility is also ensured by the fact that the transfer of a VLA in a function to the compiler does not need to accompany the VLA itself with some hidden additional information about its size. The syntax of the language makes the author of the code, willy-nilly, transmit this information independently, in the open. In the above example, the author of the code first had to list the parameters foo , n , m and k in the parameter list, because without them he would not be able to declare the parameter a (see also the remark above about n ). It is these explicitly passed by user parameters and "bring" to the function information about the actual size of the array a .

VLA array declarations do not allow initializers to be specified, which also prevents the use of VLA in compound literals

 int n = 10; int a[n] = { 0 }; /* ОШИБКА: нельзя указывать инициализатор */

The reason for this restriction, as I recall, is that there is no good answer to the question of what to do if some initializers turn out to be “redundant”.

Using the valuable properties of the VLA, we can write, for example, the following code

 #include <stdio.h> #include <stdlib.h> void init(unsigned n, unsigned m, int a[n][m]) { for (unsigned i = 0; i < n; ++i) for (unsigned j = 0; j < m; ++j) a[i][j] = rand() % 100; } void display(unsigned n, unsigned m, int a[n][m]) { for (unsigned i = 0; i < n; ++i) for (unsigned j = 0; j < m; ++j) printf("%2d%s", a[i][j], j + 1 < m ? " " : "\n"); printf("\n"); } int main(void) { int a1[5][5] = { 42 }; display(5, 5, a1); init(5, 5, a1); display(5, 5, a1); unsigned n = rand() % 10 + 5, m = rand() % 10 + 5; int (*a2)[n][m] = malloc(sizeof *a2); init(n, m, *a2); display(n, m, *a2); free(a2); }

Please note: this program actively and substantially uses the valuable properties provided by variable types. What makes this code impossible to elegantly implement without using the properties of variable types. However, this program does not create a single VLA in the local memory (!), That is, this popular direction of VLA criticism is absolutely not applicable in this code.

Due to the presence in the VLA language, it is possible to erect mysterious structures, the practical value of which is doubtful and whose behavior is not always obvious. For example, such options of the declaration of functions are admissible

 /* На уровне файла */ int n = 100; void foo(int a[n++]) {} void bar(int m, int a[++n][++m]) {} int hello(void) { return printf("Hello World\n"); } void baz(int a[hello()]) {}

The expressions used by VLA declarations within function declarations will be honestly evaluated, along with their side effects, with each function call. Note that despite the fact that the parameters of the array type will be transformed into parameters of the pointer type, this does not eliminate the need to calculate the expression used to set the size of the array in the original declaration. In this example, each call to the baz function will be followed by the output of the string "Hello World\n" .

The mention of the [C ++] tag in the question is illegal. In populist contexts, it is often possible to hear statements that in some compilers (GCC), VLAs are also supported in C ++ code. In fact, the fact that some compilers in C ++ mode allow you to specify non-constant expressions as sizes of language arrays does not at all indicate support for VLA in the C99 style in C ++ code. C and C ++ are essentially different languages and support for the C99 VLA in C ++ is difficult or even impossible. Simple experiments show that pseudo-VLA behavior in GCC C ++ is fundamentally different from the standard behavior of the C99 VLA. For example, here is the code

 #include <stdio.h> int main(void) { int n = 10; typedef int A[n]; n = 20; A a; printf("%zu\n", sizeof a / sizeof *a); }

will output 10 in C mode (as it should be), but will output 20 when compiled in GNU C ++ mode. Obviously, the concept of "typedef generating executable code" is not consistent with the fundamental ideas of the C ++ language.

It may be worthwhile to clarify that ignoring the value of n when defining typedef A[n] g ++, unlike gcc, makes the code “more robust against programmer's mistakes” (namely, corresponding to the current value of n at the point of array creation).

KoVadim KoVadim 85.5k four 66 127 · Answer 2 · 2018-11-20T14:49:35

VLA is an attempt to simplify the life of a programmer so that he can create arrays in an "intuitive way" (I specifically quoted.)

VLA is usually implemented through alloca and allocated on the stack. That is, for the compiler it is just a little to change the register of the stack pointer. That is, very, very fast. On the other hand, the stack is not dimensionless and such an array is limited to 1 or 8 megabytes (default settings in most compilers and operating systems for 32 and 64 bit systems).

Why in C ++ (although g ++, clang supports) it does not work?

In C ++, there is a vector, which is the correct array and there is no need for a VLA. In g ++, it is there, because it turned out to be easier, otherwise you would need to add an additional condition to the compiler, additional tests and plus you can use C sources in the plus code with less pain. Although I think, in fact, this is just a bug, and then issued for the feature.

German Borisov German Borisov 4,130 6 24 · Answer 3 · 2018-11-20T13:39:18

VLA in C99 is an attempt to backport std::vector from C ++ back to C.

It works about the same, but hides all the details behind the "magic" of the compiler.

With std::vector no meaning in VLA for C ++.

But it adds unexpected problems, for example, due to the fact that sizeof(arr) cannot be calculated at the compilation stage, what templates can count on.

German Borisov

4,130 6 24

Not quite clear what the problem with "sizeof can not be calculated at the compilation stage." In C ++, almost any expression may or may not be computed at compile time, depending on external factors. And "templates can count on this." What is such a sizeof worse than others? - AnT
2
However, VLAs have nothing to do with std::vector . std::vector resizable array allocated in dynamic memory. VLA is a fixed size array allocated on the stack. There is no “attempt to backport” here. - AnT
In all other cases, sizeof is a constant expression, and can, for example, be used in directive #if - Herman Borisov
one
First of all, sizeof can never be used in #if and in general in the preprocessor. (Where did you get this from?) Secondly, once again: in C ++, almost any expression can be constant or non-constant depending on the operands. If VLA were allowed in C ++, sizeof would not stand out in this respect. - AnT
one
"With std :: vector, there is no point in the VLA for C ++." Is that the VLA is created faster, because he does not need to look for memory in the heap. - HolyBlackCat

|

How does the VLA work?

3 answers 3

More articles: