I need a description of the compiler's work in a natural language composed for someone who does not know this language. That is, the textbook in the spirit of "Imagine that you are a compiler, from this we move on."

I have been learning this language since yesterday. The syntax seems chaos, I do not understand the list of actions that the compiler performs during the parsing process, I do not understand to which subsystem of the language the expression belongs and what as a result is passed to the elements of the language and functions. All these characters that do not understand how to work. This is very different from what I am used to in tcl in which there is a substitute, commands and their parameters.

  • An example of the line with "I do not understand to which subsystem of the language the expression belongs." show me ... - Vladimir Martyanov
  • printf ("Argument% d:% s \ n", i, argv [i]); For example. I do not understand at what level this is all processed. Are these all the parameters for printf that are passed without changes, and does it process them internally? - jto
  • Elementary: function call (someName ();). Three parameters: "Argument% d:% s \ n", i, argv [i] Open the documentation for the function, see what the parameters mean. Something like this. - Vladimir Martyanov
  • 3
    @jto, change the approach, think in black boxes. The same printf function in most cases turns into assembler porridge for a specific system, like everything else (even more, it does not exist in the language, but is implemented at the OS level). You do not care how it does its job, the main thing is what it does exactly as described in the documentation. - Alex Krass
  • one
    And 'C' is good because there are not that many 'elements of the language'. actually it is if, while, for, do, switch, like nothing with parentheses did not forget to list. Everything else with parentheses - functions - Mike

1 answer 1

You should think like this.

  1. The overall picture. C-files are compiled separately, the compiler does not know anything about other files, unless explicitly stated in the file. Other files are “pulled in” by the preprocessor. (For purists, yes, I can feed the .h to the compiler via the Makefile, we will not complicate the picture without the need.)

  2. Preprocessor. It goes through the code and produces stupid text macro substitutions. #define X(Y, Z) for (int i = 0; i < Y; i = i * Z) causes X(10, 2 + 1) turn into for (int i = 0; i < 10; i = i * 2 + 1) . The preprocessor, however, knows about strings, and does not make macro substitutions within them. He also applies #include by mechanically including the file in this place.

  3. Preprocessor strings. Inside string and character literals, some sequences of characters are replaced by others. For example, \n is replaced by a character with code 10. Also, wide strings literals ( wchar_t* ) can be converted from the character set of the source file to UCS-2 or UCS-4, depending on the compiler.

  4. Actually compiler. There is no magic in the compiler. There are keywords ( for , if , etc.) and functions. For example, printf is a function (from the standard library), write printf("%d\n", 15); in the compiled code, it calls the printf function and passes the parameters "%d\n" and 15 . Similarly, call printf("%d\n", ""); produces a call to the printf function with the parameters "%d\n" and "" (this call will end with a runtime error). The compiler knows the exact semantics of the printf format string and has the right to issue a hint if it sees that the parameter types do not fit the format string.

  5. Optimizer. He has the right to replace any design with a more efficient one internally, using the as if rule: if, from the point of view of the final output and the values ​​visible to the user, this does not change the result, the conversion is valid. Example: if you have a long calculation without side effects, the result of which you do not use (that is, do not output it), the optimizer has the right to throw it out. And also has the right and not to throw out. For example, the order of calculating the terms in the expression A() + B() not defined, and even if the functions A and B have side effects, the optimizer has the right to calculate them in any order, maybe even mixed. If you want to ensure that A() computed strictly before B() , use an explicit additional variable.

  6. Undefined behavior. Here be dragons. There is a fairly large set of runtime situations (for example, dereferencing a null pointer, going beyond the array boundary (!), Or a sign overflow) when the compiler no longer be responsible for the result. The compiler has the right to assume that this will never happen, make non-trivial conclusions from this, and use them to simplify the code. For example: for code

     int m[1]; if (cond) { printf("Хе-хе\n"); return; } for (int i = 0; i < 2; i++) { m[i] = 0; } 

    the compiler has the right to assume that the inversion m[1] never occurs, so the loop is not executed, so the code must exit on an early return , so cond must be true , which means it can not be calculated, and the whole function can be simplified to

     printf("Хе-хе\n"); 
  • By the way, the printf example is generally quite curious. I, for example, how printf does NOT contradict the syntax of C. There are no functions with an arbitrary number of arguments. If anyone gives a link describing her device "from the inside" to something (but not to the code, I'm not sure that there is a printf code in C), I will be very grateful - andy.37
  • 2
    @ andy.37 In C, there are functions with a variable number of arguments; in the description, the last parameter is declared as '...' - Mike
  • @ andy.37 Here is the first thing Google gave: rsdn.ru/forum/cpp/418970.1 - Mike
  • @Mike, yeah, a shame on my gray head. Forgot (((. Now, as a whole is clear. - andy.37
  • one
    @VladD "You should think as follows" - is this from the new law of the State Duma regarding the citizens of Russia? :) - Vlad from Moscow