I will try to explain.

In any programming language, everyone always struggles to ensure that the program ate as little memory as possible. As if there was no leakage. Especially in C ++, you need to correctly allocate, free.

Each data type occupies some memory space in bytes.

I will write examples on C #

Explain to me please. Why programmers, knowing, for example, that in a cycle will run over an array of 10 characters in length and this length will be constant, they still write

for (int i = 0; i < someVar.Length; i++) { // do something } 

Why an int counter? Why not sbyte or byte ? Yes, maybe it will work quickly in this scope. But there are a lot of such blocks by code.

Similarly with arrays. Knowing exactly the length ( Amendment. Because the answers are already there, but the question was asked ... a little confused ...: not the length, it's over, but exactly the maximum numbers ... for example, there will be numbers from 1 to 100 ). All the same, often all declare

 int[] arr = new int[8]; 

instead

 ushort[] arr = new ushort[8]; 

or

 byte[] arr = new byte[8]; 

This array will occupy the memory already until it is killed or the program finishes its work.

So from here is the question. Why is everyone doing this? I don’t know something yet or is it just lazy programmer?

  • 2
    1) for (int i = 0; i < someVar.Length; i++) {...} - Length - is the int the same? why then should i use a smaller data type? And if tomorrow there will suddenly be more elements than I thought? Then change it in a bunch of places? And I need it? 2) Yes, maybe it will work quickly in this scope. But there are a lot of such blocks by code. - this is not the place to optimize 3) ushort[] arr = new ushort[8]; - and unless so you do not change the meaning of the original array? You had an int array, and you became an ushort array. Here it’s not a matter of array length, but in its values - BOPOH
  • 3
    In any programming language, everyone always struggles to ensure that the program ate as little memory as possible. - this is no longer the case. Modern programs prefer to spend more memory to speed up the calculations. Any cache is a counter example to your statement. - VladD
  • In addition, the programmer is likely to use foreach - kandi
  • @danpetruk why use forich when the dimension is known? Why overhead for getting enumerators, etc.? - Dmitry
  • @Dmitry: The ease of maintaining the code and the obvious absence of errors more than cover the cost of one unfortunate iterator. Which, by the way, most likely will be all the same thrown out by the optimizer. - VladD

4 answers 4

Your basic assumption — saving memory at all costs — is incorrect.

For example, the processor works much more efficiently with the int data type than with byte , so programmers and the compiler use int much as possible to speed up work, neglecting small memory losses and gaining performance. In the same way, the double data type is usually used instead of float because it works faster, although it takes twice the space.

Then, align the data structures. Compilers insert extra bytes between the fields of the data structures for alignment and thus faster access.

Another example is loop unrolling and function inlining. Optimizing compiler unfolds cycle

 for (int i = 0; i < 10; i++) f(i); 

at

 f(0); f(1); f(2); f(3); f(4); f(5); f(6); f(7); f(8); f(9); 

because it is faster. No one needs a win of three bytes, anyone feels a win of three milliseconds.


Further, regarding the cycle. In modern programming, the code should be sufficiently general not to emphasize low-level optimizations, but to focus on the semantics, the meaning of the code. Therefore, low-level parts are trying to hide where possible.

From this point of view, the detail that the data array has exactly 10 elements, and 4 bits (economy!) Could be used to index it, is simply ignored. Besides. the possibility of indexing by the “narrow” data type is not checked by the compiler, which means that if tomorrow this code receives an array with a large number of elements as input, the code will silently stop working.

The correct modern approach to sorting the elements of an array is:

 foreach (var item in array) { // process item } 

Here we abstract from the size of the array (will work with any size), from a specific search method (we do not explicitly encode the order of enumeration of elements), from the array itself (the same code will work with the list), thus shifting the low-level optimizer care to the compiler , and make the code easier to maintain.


Do not do simple, trivial things difficult. Leave the power of your thinking process for really complex tasks and for algorithmic optimization. Let the compiler optimize for you, believe me, he can do it better.

  • I would not be so categorical and consider a variant with random access by array indices, when the char[] array is fully cached, and the int[] array (it is 4 times larger) is already gone. The difference can be at times. With sequential access, this may not be the case because of the prefetch. But, in almost any case (imho) (in C code on modern Intel) char (if there are no additional algorithmic costs) will be no slower than int. - avp
  • @avp: Yes, but I would put more emphasis on maintainability. It’s not a matter of saving where the need for saving was not shown by the profiler I think that using char is turned into an int by the compiler on fairly good optimization settings. - VladD
  • Hardly a compiler (at least sy). In fact, the conversion (in long) occurs naturally when the data is loaded into the register. - avp
  • @avp: And this is not long and check. I'll do it now. - VladD
  • one
    @avp: Yeah, you're right, I put it wrong. At me gcc unrolled in sense unroll on 4 iterations. And in the sense of the direction of change, always, when not unroll'it. - VladD

Announce in this loop

 for (int i = 0; i < someVar.Length; i++) { // do something } 

the variable i , as having a type, for example, byte makes no sense, since in this condition

 i < someVar.Length 

it will be converted to type int . That is, when generating the object code, additional commands will be added. Moreover, this variable is local and does not significantly affect memory usage.

As for arrays, they usually declare an array of the type whose objects are required. It is not always possible to say in advance what will be the upper and lower values ​​of the elements of the array.

If you know that you are satisfied with an array of type byte , then you can define an array of this type. However, again, in various arithmetic and other operations, the objects of your array will be constantly converted to type int .

There are other related problems. For example, in C ++ you can declare an array of type int8_t . However, this type is an alias for the type char . Then, when outputting elements of such an array to the console, using operator operator << you will have difficulties, since this operator will try to output integer values ​​as characters. For example, if you declared an array

 int8_t a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 }; 

then when you output it to the console

 for ( int8_t x : a ) std::cout << x; 

Strange characters appear on the console screen, or when displaying the number 9, nothing at all appears, and the cursor jumps forward several positions, since the value 9 will be considered by the output operator as a tabulation symbol.

Therefore, you will have to constantly remember that instead of the above sentence you need to write

 for ( int x : a ) std::cout << x; ^^^^ 

All this can only be a source of difficult to find errors.

Here is another C ++ example that can lead to an error. Suppose you have the following ads

 unsigned short a = 5; unsigned short b = 10; std::cout << a - b << std::endl; unsigned int x = 5; unsigned int y = 10; std::cout << x - y << std::endl; 

As you can see, all variables have an unsigned integer type and the same values. The question is whether the output to the console of two operators output

 std::cout << a - b << std::endl; std::cout << x - y << std::endl; 

same? :) Check it yourself. :)

There is a dilemma: more memory - more execution speed, less memory - less execution speed.

You can optimize your program based on the criteria that are your priority.

But it is necessary to engage in optimization when you know exactly the reason and place for the unsatisfactory work of the program.

  • Есть дилемма: больше памяти - больше скорость выполнения, меньше памяти - меньше скорость выполнения. - not always. See my comment to the answer @VladD - avp
  • @avp This is always the case, because at least you can, with more memory, use code that was written for less memory. The opposite is not true. :) - Vlad from Moscow
  • And, you are in this sense (the amount of available memory (or even installed in the computer)). I meant the memory occupied by variables in the program. - avp 1:58

Pro optimization on matches.

In the expression for (int i = 0; i < someVar.Length; i++) { ... } there is a single memory allocation and a single release after the exit. Even if there are many such blocks, each time you get a single selection and release. The matter changes only if you have nested loops, but in this case we are talking about single selections, which are immediately released. As a result, your program at any time on such blocks, no matter how many, will save an average of 4-7 bytes (12-21 in the case of two or three nested loops) and lose a lot of performance. Answer the question, given the above, is this pair of bytes important for the program?

In arrays do not confuse their length and their stored data type. The length in square brackets is used to transfer int and, again, it is a single allocation of memory, and the type of the array is determined based on the data stored in it ( Int16 , Int32 , Int64 ). Although if there is not much data and there is enough memory for sure, sometimes it is possible to neglect the usual Int32 .

In any case, in your program there will surely be a bunch of other places that are much more critical in terms of memory allocation and speed than the ones above.

    the array contains int values

     int[] arr = new int[8]; 

    array contains uint16 values

     ushort[] arr = new ushort[8]; 

    array contains values ​​of type byte

     byte[] arr = new byte[8]; 

    You presented incorrect examples. And as correctly noted @BOPOH, the Length dimension is an int , and there is also LongLength , which is long or int64