What exactly does the stdlib.h API call?

Question

When I write in C like this:

#define <stlib.h>

And below like this:

 void *ptr=malloc(1);

That library malloc should be called, right? And where is its implementation? I can't find anything. Will it be a call from .so or what? Can you tell us how it works?

Accepted Answer · 2016-09-17T21:17:30

Static layout

If to understand, the task of calling other people's functions by name is not easy to solve. It is for its solution that the intermediate level between the source code and the executable code is introduced, namely the object code . The program passes (simplified) two compilation stages - the creation of object modules and the layout (also known as linking ).

The object of the module contains the machine code, but, in addition, the names along with the addresses of the objects: procedures and data structures.

The formats of object modules can be different. For DOS, the format was developed by Intel in ancient times, even for the x86 architecture. It was more or less adhered to by developers of different languages and environments, so that theoretically it was possible to link libraries created by different compilers and even from different languages.

A typical exception was Turbo Pascal and Borland Delphi, it used its own object format - *.dcu .

On UNIX and Windows, the COFF standard, not bound to architecture, was used. Now in the UNIX-world ELF entrenched. For small microprocessor systems, IEEE695 is used.

In addition to formats, there is also a set of conditions under which modules can be compatible with each other. These include, for example, parameter transfer agreements, architecture bits, memory model, etc.

Now how it works. Each function in each source module must be implemented in this module. If this is not the case, the compiler expects the function to be described: what parameters it takes, and what it returns. This information allows the compiler to generate code to call a function with these parameters "as yet unknown where".

When generating the code, the compiler stores in the object of the module all places where the external function is called. When her address becomes known, it will need to be updated in all these places.

All internal functions (which are implemented in the module), the compiler calls to a known address, but this address is relative, that is, it is counted from the beginning of the module.

All functions and data that are not marked with the static visible from other modules, that is, a table of names and relative addresses is stored in the module object. You can not only call functions from other modules, but also access arrays and variables from these modules.

Further, if you create a library of functions, for example, a standard library , then it is convenient to merge all these hundreds of object modules into one file, such a file is called a library one . It is important that when building the program from the library file, only the necessary object modules will be delivered, and not all.

Now, attention, focus: when compiling a program, you explicitly indicate what it should consist of. You list all *.c files, all *.o or *.obj files, as well as all *.lib or *.a files. The compiler creates object modules from source code, and then passes them all, including those you explicitly specified, to the linker.

The linker builds the executable. It adds all the object modules one after another, and counts the addresses so that they are counted from the beginning of the executable image. An object module is the smallest unit of composition: you cannot take from it only one function, or one array, only all of them together. But from the libraries, only the necessary object modules are selected.

Linker also resolves links , that is, matches names and writes addresses. At this stage, there is no longer any type, so the linker can not check the compliance of the parameters, or something else like that. He just checks the names.

As a result, you get an executable file that will be loaded by the loader when the program starts. The loader, in turn, can change some relative addresses to absolute, it depends on the operating system, architecture, and so on, that is, this is not always necessary. I write this then that this part becomes important when dynamically linking code.

A brief digression. If the object modules produced by different compilers are compatible with each other, then even modules written in different languages can be put together. The case of C and C ++ is quite common, but it is simple. However, you can combine Fortran and C, or Pascal and C. Usually, in this case, you have to explicitly follow call calling conventions and use strange keywords like _cdecl , _stdcall , etc.

If you call one function on C from another function on C that are in different source files, then by default they have the same agreement, and you don’t need to worry about it.

Dynamic layout

The procedure described above has one major drawback: all the necessary object modules will be embedded in the executable module. The standard library is used in all programs, which means that it lies on the disk in dozens and hundreds of copies.

Ideally, it should be in one copy, and programs should load object modules from one standard location.

These dynamic link libraries are widespread in all modern operating systems. On Windows, this is *.dll , and on Linux it is *.so .

In essence, these are the same library files that, however, are loaded entirely. The final binding of addresses is performed not by the linker, but by the loader. The algorithm is approximately the same: in the simplest case, when the application starts, all the dynamic libraries that it needs are immediately loaded, and the loader writes the real function addresses to all places of calls.

Naturally, the names of all exported methods and data are clearly written in such a library file.

When and what is used

It may be thought that the static and dynamic layouts are strictly contradictory, but this is not the case. You can connect part of libraries statically, and partly dynamically. You can even load dynamic libraries explicitly in your program, but in this case you cannot call the functions themselves, but only via a function pointer.

There are many options, but you have a simple case.

When you call malloc , the compiler understands from arguments that we are talking about dynamic linking (this makes me mention the *.so files). This means that when you load your executable module it will require the presence of certain dynamic libraries that are listed inside it along with the function names.

The loader will find these libraries, load them into the address space of the program, and register the correct addresses in all places of the call. When you call malloc function will be called at a direct address, without any function pointers — the fastest way.

Disclaimer

All this information is presented in a generalized form. For certain formats, operating systems, compilers, there may be additional agreements.

For example:

Object modules include debugging information.
In object modules place a partial result of compilation. For example, it is necessary in C ++ in order to “hide” the source code of generalized (template) classes. Naturally, this is all non-standard. When I dealt with this, there were separate, almost experimental compilers that supported it. I do not know how things are at the front now.
In the Top Speed compiler system, the format system was more complicated than the one described, because the developers wanted to make the most of the same tools for different languages, for example, they used a common low-level optimizer. This means that shared files were used to the level of object modules.
In DOS / Windows, there is a format of executable files *.com . It does not require address correction, in fact, it is enough to load it into RAM, and transfer control to the first instruction, and it will work immediately. All this is possible due to the segment organization of memory in the x86 architecture, and due to the fact that the *.com file must fit in one segment.

Therefore, in some cases, everything may be more complicated or simpler than described here. Better on specific issues consult the documentation.

Please clarify: 1. * .lib in fact is no different from * .obj?
2. Is the entire library code or only called entities in the application being used?
3. When you wrote about the format of an object file, did you mean ABI?
@MikeAJ 1. *. Lib is essentially a set of * .obj, the difference in this.
2. An object module is the smallest unit of connection: if you refer to a function from a module, you get it all.
And from the library you will not receive the entire library, but only the necessary object modules.
Therefore, the size of the program can be reduced by placing related functions in one module, which are often called together, and non-related functions placed in different modules.
Object files obtained by passing the frontend from several different languages are compatible in format and can be correctly linked to each other.
This is the same as your example with "It was more or less followed by developers of different languages and environments, so it was theoretically possible to link libraries created by different compilers and even from different languages."
I have no experience with LLVM, the analogy may be false, because the bytecode is at a higher level than the machine, for example, it can include meta-data.

Harry Harry 106k 9 54 132 · Answer 2 · 2016-09-17T20:31:00

Roughly so - now the compiler knows that there is such a function malloc , which takes an argument of such and such type, returns a value of this type. And he can create code that prepares the actual arguments for the call, makes the call — this is where he will enter just the name, roughly speaking, and then, after the call, he will know where and how to pick up the return value and what to do with it.

Now the linker (linker) comes into play. He sees that there is a call by name, searches for this name in libraries (well, first in your object files, but it’s clear that it’s not there), finds and adds the code of the function from the library to the executable file, and the command call by name replaces the call by corresponding address. If this is a static layout. If dynamic, then it simply inserts the code of the function call from the dynamic library.

Like that.

Those. its implementation - in the form of object code - in a statically composable library. Or in the form of executable code - in the dynamic, and in the linked library, respectively, an indication of where to look for it.

But in the end, such a call in linux will cause a malloc from a dynamic or from a static one?
How does the compiler find the name of the entity in the executable file, because there are no letters in the binary .so (or are there constant strings in the code section?)?
After an excellent answer @Mark, I think you no longer need to answer?

What exactly does the stdlib.h API call?

2 answers 2

Static layout

Dynamic layout

When and what is used

Disclaimer

More articles: