I study how structures are transferred to functions and returned from functions. Such code:

#include <stdio.h> struct point { int x; int y; }; struct point makepoint(int x, int y) { struct point tmp; tmp.x = x; tmp.y = y; return tmp; } int main() { struct point pt = makepoint(1, 2); struct point *pp = &pt; return 0; } 

Here are the disassembled listings of both functions. I will try to comment on them.

main function:

  0x080483e2 <+0>: push ebp 0x080483e3 <+1>: mov ebp,esp 0x080483e5 <+3>: sub esp,0x10 ; выделили 16 байт для структуры 0x080483e8 <+6>: lea eax,[ebp-0xc] 0x080483eb <+9>: push 0x2 0x080483ed <+11>: push 0x1 0x080483ef <+13>: push eax ; передали адрес структуры скрытым параметром 0x080483f0 <+14>: call 0x80483bb <makepoint> 0x080483f5 <+19>: add esp,0x8 ; освобождаем только 8 байт, остальные заняты структурой 0x080483f8 <+22>: lea eax,[ebp-0xc] 0x080483fb <+25>: mov DWORD PTR [ebp-0x4],eax 0x080483fe <+28>: mov eax,0x0 0x08048403 <+33>: leave 0x08048404 <+34>: ret 

Note that the main function allocates 16 bytes in the stack, but when the stack is aligned, only 8 bytes are freed. The remaining 8 bytes are occupied by the structure.

makepoint function:

  0x080483bb <+0>: push ebp 0x080483bc <+1>: mov ebp,esp 0x080483be <+3>: sub esp,0x10 0x080483c1 <+6>: mov eax,DWORD PTR [ebp+0xc] 0x080483c4 <+9>: mov DWORD PTR [ebp-0x8],eax 0x080483c7 <+12>: mov eax,DWORD PTR [ebp+0x10] 0x080483ca <+15>: mov DWORD PTR [ebp-0x4],eax 0x080483cd <+18>: mov ecx,DWORD PTR [ebp+0x8] 0x080483d0 <+21>: mov eax,DWORD PTR [ebp-0x8] 0x080483d3 <+24>: mov edx,DWORD PTR [ebp-0x4] 0x080483d6 <+27>: mov DWORD PTR [ecx],eax 0x080483d8 <+29>: mov DWORD PTR [ecx+0x4],edx 0x080483db <+32>: mov eax,DWORD PTR [ebp+0x8] 0x080483de <+35>: leave 0x080483df <+36>: ret 0x4 

Look again at this place in main :

  0x080483eb <+9>: push 0x2 0x080483ed <+11>: push 0x1 0x080483ef <+13>: push eax 0x080483f0 <+14>: call 0x80483bb <makepoint> 

Taking into account the return address and saved ebp , the address of the structure (push eax) will be shifted to 3 dvds, this is 0xC bytes, and, apparently, we see the address to the address of the structure here:

 mov eax,DWORD PTR [ebp+0xc] 

What happens next in makepoint ? It is difficult to understand this code.

In general, as I understand it: when a function transfers a structure to another function, or when another function returns a structure, the calling function itself allocates space for this structure. Then it transmits only the address of this structure with a hidden parameter.

There is also such a thing as alignment of the structure fields. Can this be observed in the listing? What is the reason for this alignment?

    4 answers 4

    Inside the makepoint function, the following happens:

    The local tmp structure is located at ebp - 0x8 . Respectively code

     // Копируем x 0x080483c1 <+6>: mov eax,DWORD PTR [ebp+0xc] 0x080483c4 <+9>: mov DWORD PTR [ebp-0x8],eax // Копируем y 0x080483c7 <+12>: mov eax,DWORD PTR [ebp+0x10] 0x080483ca <+15>: mov DWORD PTR [ebp-0x4],eax 

    this is nothing but filling the local structure fields from DWORD PTR [ebp+0xc] and DWORD PTR [ebp+0x10] (parameters x and y respectively), i.e. this

     tmp.x = x; tmp.y = y; 

    And then it goes simply to copy the value of the local structure into the external recipient structure, whose address is in the DWORD PTR [ebp+0x8]

     // Извлекаем адрес получаетеля и кладем в ecx 0x080483cd <+18>: mov ecx,DWORD PTR [ebp+0x8] // Копируем tmp в получателя 0x080483d0 <+21>: mov eax,DWORD PTR [ebp-0x8] 0x080483d3 <+24>: mov edx,DWORD PTR [ebp-0x4] 0x080483d6 <+27>: mov DWORD PTR [ecx],eax 0x080483d8 <+29>: mov DWORD PTR [ecx+0x4],edx 

    Further, according to the calling convention, the recipient's address is also returned to eax

     0x080483db <+32>: mov eax,DWORD PTR [ebp+0x8] 

    Your assumption about

     0x080483f5 <+19>: add esp,0x8 ; освобождаем только 8 байт, остальные заняты структурой 

    not true (if I understood him correctly).

    Before the makepoint call, the x , y values ​​and a pointer to the result receiver structure were placed on the top of the stack. The exit from the makepoint was made by ret 4 . This instruction removed the address of the recipient structure from the stack, but left the x and y arguments on the stack. The add esp,0x8 instruction removes these x and y arguments from the stack, which together occupy 8 bytes.

    • Regarding the last addition - interesting was a mixture of cdecl and stdcall. - insolor
    • "Your assumption about the wrong" Then where is the structure of the receiver, which is passed to the pointer? I understand that. In the main function, 16 bytes are allocated for the structure that will be returned from the function. The number 16 is taken due to alignment. When the structure allocated in main is filled, the extra allocated space (8 bytes) is freed. Only 8 bytes for 2 structure fields are left. I read everything completely, and now I do not understand, but where is the structure itself at ebp + 0x8. Looks like you need to draw all this data on the stack. - typemoon
    • 2
      @typemoon: How where? The receiver structure sits at ebp-0xc in main . In main 16 bytes are allocated for the pp and pt variables. The first one is on ebp-0x4 , and the second is on ebp-0xc . Total - 12 bytes. 16 is taken really for alignment. There are 4 “extra” bytes, not 8, as you incorrectly think. And again, add esp,0x8 has nothing to do with this area of ​​memory. add esp,0x8 is compensation for push 0x2 and push 0x1 , which put 8 bytes on the stack. These 8 bytes are still on the stack after the makepoint call. Here they are cleaned. - AnT

    You started the study from the very beginning from the wrong side. Depending on the wishes of the compiler, it can do whatever it wants with your structure. However, there are so-called calling conventions that set a set of rules on how to pass parameters. I would, at your place, begin with the study of this question. It is better to immediately look for material in English (there it is called the Calling convention). In addition, you probably disassembled the code in debug mode, and in full optimization mode it may turn out to be very, very different. Moreover, even the alignment issue can be solved by the compiler in various ways. The question about who allocates memory and who frees it, too, is solved in different ways. For example, in the __fastcall convention, it is the function being called that releases the stack.

    In short, I just want to say that studying the listing is the worst way to understand what the compiler does, because it can do everything differently under different conditions. Need to read documentation.

      My little research question.

      I add output to the code so that due to optimizations the compiler does not throw out the filling of the structure at all:

       #include <stdio.h> struct point { int x; int y; }; struct point makepoint(int x, int y) { struct point tmp; tmp.x = x; tmp.y = y; return tmp; } int main() { struct point pt = makepoint(1, 2); struct point *pp = &pt; printf("%p: %d, %d", pp, pt.x, pt.y); return 0; } 

      I compile with -O1 optimization with gcc ( gcc -O1 -c testmakestruct.c -S -masm=intel ), I get the following code:

       _makepoint: LFB7: .cfi_startproc mov eax, DWORD PTR [esp+4] mov edx, DWORD PTR [esp+8] ret .cfi_endproc ; ... _main: ; ... ; Заполнение полей структуры (заинлайненная функция makepoint): mov DWORD PTR [esp+24], 1 mov DWORD PTR [esp+28], 2 ; Добавление значений полей в стек для printf в виде непосредственных операндов mov DWORD PTR [esp+12], 2 mov DWORD PTR [esp+8], 1 lea eax, [esp+24] mov DWORD PTR [esp+4], eax ; Загрузка указателя на структуру в стек mov DWORD PTR [esp], OFFSET FLAT:LC0 ; "%p: %d, %d" call _printf mov eax, 0 

      The part of the generated code at the beginning of the main function is omitted for simplicity.

      When optimizing -O2 result changes slightly.

      Interestingly, the makestruct function is makestruct in the code, but it is not called from main .

      The assembler code specified in the question was clearly obtained by compiling without optimization, otherwise the compiler would see that the structure is not used at all, and would happily throw out its initialization and itself.

      Further, due to the disabled optimization, the compiler generates a rather non-optimal assembler code of the makestruct function, which, in general, is not worth understanding (except for sports interest), since in the real world, the generated assembler code is likely to be simpler and easier to understand.

      • Somehow all is difficult. In the sense that if the assembly code needs to restore the source code, it is difficult to guess that the structure is used here. - typemoon
      • Although, probably, by the presence of lea [esp + 24] and the loading of the number at the same address and another number at [esp + 24 + 4], you can guess about it. - typemoon
      • @typemoon, in general, it is impossible to unambiguously restore the source code of a high-level language from assembly / machine code. Structures can be identified by indirect signs. Yes, it uses a pointer that can say that it is a structure. But all the same, there is ambiguity: how to determine, for example, this structure was with two fields of the same type or an array of two elements? - insolor

      Formally, the language does not determine in any way exactly how (in which machine commands) the compiler will perform certain actions ...

      I would not be surprised if this compiler

       struct point pt = makepoint(1, 2); 

      will do so - will immediately start writing to the place in memory where pt is located, without creating a local tmp - in C ++ it is standard, i.e. specified in the standard optimization. And, with a quick glance at the code, it seems that it does so. However, I read the assembler from the sheet :(