Static layout
If to understand, the task of calling other people's functions by name is not easy to solve. It is for its solution that the intermediate level between the source code and the executable code is introduced, namely the object code . The program passes (simplified) two compilation stages - the creation of object modules and the layout (also known as linking ).
The object of the module contains the machine code, but, in addition, the names along with the addresses of the objects: procedures and data structures.
The formats of object modules can be different. For DOS, the format was developed by Intel in ancient times, even for the x86 architecture. It was more or less adhered to by developers of different languages and environments, so that theoretically it was possible to link libraries created by different compilers and even from different languages.
A typical exception was Turbo Pascal and Borland Delphi, it used its own object format - *.dcu .
On UNIX and Windows, the COFF standard, not bound to architecture, was used. Now in the UNIX-world ELF entrenched. For small microprocessor systems, IEEE695 is used.
In addition to formats, there is also a set of conditions under which modules can be compatible with each other. These include, for example, parameter transfer agreements, architecture bits, memory model, etc.
Now how it works. Each function in each source module must be implemented in this module. If this is not the case, the compiler expects the function to be described: what parameters it takes, and what it returns. This information allows the compiler to generate code to call a function with these parameters "as yet unknown where".
When generating the code, the compiler stores in the object of the module all places where the external function is called. When her address becomes known, it will need to be updated in all these places.
All internal functions (which are implemented in the module), the compiler calls to a known address, but this address is relative, that is, it is counted from the beginning of the module.
All functions and data that are not marked with the static visible from other modules, that is, a table of names and relative addresses is stored in the module object. You can not only call functions from other modules, but also access arrays and variables from these modules.
Further, if you create a library of functions, for example, a standard library , then it is convenient to merge all these hundreds of object modules into one file, such a file is called a library one . It is important that when building the program from the library file, only the necessary object modules will be delivered, and not all.
Now, attention, focus: when compiling a program, you explicitly indicate what it should consist of. You list all *.c files, all *.o or *.obj files, as well as all *.lib or *.a files. The compiler creates object modules from source code, and then passes them all, including those you explicitly specified, to the linker.
The linker builds the executable. It adds all the object modules one after another, and counts the addresses so that they are counted from the beginning of the executable image. An object module is the smallest unit of composition: you cannot take from it only one function, or one array, only all of them together. But from the libraries, only the necessary object modules are selected.
Linker also resolves links , that is, matches names and writes addresses. At this stage, there is no longer any type, so the linker can not check the compliance of the parameters, or something else like that. He just checks the names.
As a result, you get an executable file that will be loaded by the loader when the program starts. The loader, in turn, can change some relative addresses to absolute, it depends on the operating system, architecture, and so on, that is, this is not always necessary. I write this then that this part becomes important when dynamically linking code.
A brief digression. If the object modules produced by different compilers are compatible with each other, then even modules written in different languages can be put together. The case of C and C ++ is quite common, but it is simple. However, you can combine Fortran and C, or Pascal and C. Usually, in this case, you have to explicitly follow call calling conventions and use strange keywords like _cdecl , _stdcall , etc.
If you call one function on C from another function on C that are in different source files, then by default they have the same agreement, and you don’t need to worry about it.
Dynamic layout
The procedure described above has one major drawback: all the necessary object modules will be embedded in the executable module. The standard library is used in all programs, which means that it lies on the disk in dozens and hundreds of copies.
Ideally, it should be in one copy, and programs should load object modules from one standard location.
These dynamic link libraries are widespread in all modern operating systems. On Windows, this is *.dll , and on Linux it is *.so .
In essence, these are the same library files that, however, are loaded entirely. The final binding of addresses is performed not by the linker, but by the loader. The algorithm is approximately the same: in the simplest case, when the application starts, all the dynamic libraries that it needs are immediately loaded, and the loader writes the real function addresses to all places of calls.
Naturally, the names of all exported methods and data are clearly written in such a library file.
When and what is used
It may be thought that the static and dynamic layouts are strictly contradictory, but this is not the case. You can connect part of libraries statically, and partly dynamically. You can even load dynamic libraries explicitly in your program, but in this case you cannot call the functions themselves, but only via a function pointer.
There are many options, but you have a simple case.
When you call malloc , the compiler understands from arguments that we are talking about dynamic linking (this makes me mention the *.so files). This means that when you load your executable module it will require the presence of certain dynamic libraries that are listed inside it along with the function names.
The loader will find these libraries, load them into the address space of the program, and register the correct addresses in all places of the call. When you call malloc function will be called at a direct address, without any function pointers — the fastest way.
Disclaimer
All this information is presented in a generalized form. For certain formats, operating systems, compilers, there may be additional agreements.
For example:
Object modules include debugging information.
In object modules place a partial result of compilation. For example, it is necessary in C ++ in order to “hide” the source code of generalized (template) classes. Naturally, this is all non-standard. When I dealt with this, there were separate, almost experimental compilers that supported it. I do not know how things are at the front now.
In the Top Speed compiler system, the format system was more complicated than the one described, because the developers wanted to make the most of the same tools for different languages, for example, they used a common low-level optimizer. This means that shared files were used to the level of object modules.
In DOS / Windows, there is a format of executable files *.com . It does not require address correction, in fact, it is enough to load it into RAM, and transfer control to the first instruction, and it will work immediately. All this is possible due to the segment organization of memory in the x86 architecture, and due to the fact that the *.com file must fit in one segment.
Therefore, in some cases, everything may be more complicated or simpler than described here. Better on specific issues consult the documentation.