class MyClass { MyClass(); int a; int b; void MyMethod(); }; 

Always tormented by the question what does the high level compiler parser do when it sees such a construct? Writes a type description with method addresses and variable sizes / types to some table inside the program file? What happens when you create a class object in a stack?

 MyClass Object; 

From this table, the sizes and types of variables are extracted, they are assigned the appropriate addresses, the address of the constructor is placed on the call stack, and then initialization takes place depending on what is written in the constructor? Where can I read more about this process?

  • four
    want to know how it all works - buy the book of the dragon - KoVadim
  • Oh, thanks, I'll take a look! 750 pages, but it looks like a comprehensive guide. - igumnov
  • four
    @KoVadim I followed the link and barely managed to get out of recursion. - Costantino Rupert
  • added standard, but something added strange. But the phrase "compilers, the book of the dragon" google is great. - KoVadim
  • one
    [I'll just leave it here] [1] [1]: padabum.com/d.php?id=2169 - igumnov


3 answers 3

In general, everything is very dependent on the compiler. Some smart compilers may even throw out the class itself if it is not necessary. But there are still general principles.

Let's start

When a compiler sees a class definition, it simply parses it. But when you need to create an instance of a class, this is where the fun begins. The compiler, looking at the description of the class, calculates how much memory is needed for it. In the class described in the question - at least 8 bytes (hereinafter I will speak in the context of 32-bit x86 platforms). In fact, it can stand out more - for example, if the class is virtual, then from another 4 bytes for a pointer to a table of virtual functions. That is, in a general way, new for a class is just alloc + memset (and if the constructor is not trivial, then the constructor call).

For classes without virtual methods, memory is usually allocated as a structure with the appropriate fields. For virtual there can be at least one more field.

And what are "methods", they are functions of a class?

these are the most common functions, they just have one more (although no one forbids the compiler to use more, but usually this is one), which in fact is a pointer to the memory allocated earlier. In the method, this parameter looks like this.

How to access the fields:

For each field, the compiler calculates the offset relative to this. For example, we have:

 MyClass m; ma = 10; mb = 20; 

On pseudocode this is so

 mov [this+0], 10 mov [this+4], 20 

offset +4, because the size of int is 4. But the compiler can do the alignment and in fact the second field can be at offset 8.

Calling methods:

And the same as calling ordinary functions. Just as I wrote above, we add another parameter - a pointer to the class. The compiler knows the address of the method.

Calling virtual methods:

With them more interesting. The method table is used for this (it seems that they have not thought up a better one yet. You can read more details here .) The compiler takes its index in the table by the name of the function. And when you need to make a call in the code, it will be so

 mov eax, [this+8] ; адрес таблицы методов. mov eax, [eax + номер_метода]; загрузили адрес push параметр push this call[eax] ; вызываем функцию по адресу 

But the compiler can cheat. If he can determine which particular method to call, then he can insert the call directly. Moreover, the compiler may not even insert the this parameter if it is not used inside the method.

I think it is clear that virtual method tables are created one for each class, and not for each object.

Creating objects on the stack

and here is nothing special. In the classic "allocate memory on stack" implementation, it just changes the top of the stack pointer. As the stack grows from top to bottom, this is the subtraction of the size from the register storing the top of the stack. In C, there is even such a function - alloca (in visual studio it may be called _alloca), which works like malloc, but allocates it on a stack.

abstract methods

These methods are in the virtual method table, but indicate a special function that displays a message stating that such methods cannot be called.

any strange

In the resulting code, there are usually no method or field names. There are only addresses and offsets. And there are no more types either. But if the debugger needs to show the user data, then it receives from the compiler a special map file, where all this is painted. That is why, if debugging a release code, the debugger often cannot even bind the code to a binary code — it simply does not have this information. And guessing is very difficult.

But sometimes compilers, especially if they do debugging code, may add additional fields to check that the code does not matter. For example, add a real type of object and compare it if necessary.

And sometimes, the programmer wants to use rtti, here you need to add in some kind of data.

    At least three things happen:

    1. On the stack, a piece of this size is allocated so that the object is guaranteed to fit in there.
    2. Somewhere the fact is remembered that when exiting the block, you need to call the destructor.
    3. Control is passed to the constructor.

    No sizes or types of internal variables are loaded from any table. The calculation of addresses of internal variables occurs during compilation of the code that actually refers to them. The code that creates the object, oddly enough, just creates an object, and does nothing else.

    • @Shamov Thank you, good answer. - igumnov

    You can generate an assembler listing from the source and see. In short, the compiler allocates the necessary memory for the object in the stack, then calls the constructor and passes the address of the previously allocated memory to it via this . Something like this.

    • In this listing there will be 100 kilobytes of meta information like PE headers, besides that the class name will be 100% replaced with some code and I will never find anything in this HEX garbage can. - igumnov
    • Strange, I have the entire listing of a simple example ( pastebin.com/8x1AsSPk ) took 3kb ( pastebin.com/Mh6zTJSG ) in the debug version. Perfectly visible through the eyes. - fogbit
    • Yes, more, less readable, and how did you get it? From the command line with the / FA key launched? - igumnov
    • one
      You can / FAcs, you can click on the cpp-file -> Properties -> C / C ++ -> Output Files -> Assembler Output in the project tree, and set the desired one there. After that, compile this cpp-file, in the Debug folder (if a debug version is being compiled) a file of the same name will appear with the extension ".cod". For g ++ there is a similar option, but I don’t remember it. - fogbit