Parsing Python mechanisms

Question

Not so long ago I use the Python language for my own purposes. There was a question with the implementation of the language itself. I'll start with the ambiguities in order:

Python, as I understand it, wrote in C. But then what does Cython mean? It is the same ?
If it is written in C, how is it interpreted? I have an assumption that each line is compiled separately and immediately starts, is this true? Can I read the source code somewhere?
On the Internet there is information about the virtual machine Python. What it is ? I know that Java has one. But Java is not an interpreted language. After compilation, it turns out a byte code, which then JRE interprets on the go for a specific machine. But then there is no resemblance to the JRE ... Or is there?

Community spirit ♦ one · Accepted Answer · 2016-06-29T10:55:26

Strictly speaking, there are several interpreters of the python language written in different programming languages.

The reference implementation is the implementation of the interpreter on C - CPython (C PayTon).

There is also an implementation of the interpreter in Java (Jython), on .Net (IronPython), and even at python itself (PyPy).

In most cases, it is enough reference implementation.

Do not confuse CPython and Cython - these are different things. The first is the interpreter, the second is the addition to the language. Since CPython is written in C, you can use structures specific to C in it. This is what the Cython project does - in python code, you can declare a structure from the C language and then work with it. It is necessary, by the way, extremely rarely, used in very specific projects.

Python programs are translated into bytecode when you run this program. This creates files with the extension .pyc - compiled. They can be run even without having the original py file (and even transfer back to py if they have not been deobfused). Previously, .pyo files were also created - object files, but since 3.x they are no longer created. Also in python3 their location has changed - now they are placed in the __pycache__ subdirectory.

Python sources can be found on python.org (they also have a read-only mirror on github )

Added:

Let's take a closer look at python versus java.

Java has an interpreter and a virtual machine, and Python has a virtual machine, just like an interpreter. The reason why the “virtual machine” is closer to Java, and the “interpreter” closer to Python, lies in the big difference between these two languages: static typing in Java versus dynamic typing in Python. Here, speaking of types, I will keep in mind the data structures stored in memory and with which this programming language works.

The Java Virtual Machine is simple - it requires the programmer to declare the type of each variable in the code. It provides enough information so that Java bytecode can be interpreted not only by the Java virtual machine, but also compiled into machine instructions (assembly code).

The Python virtual machine is much more complicated in that it takes on additional suspension tasks before performing each operation in order to determine the data type of each variable or data structure included in each operation. Python frees the programmer from thinking at the level of basic data types * and allows you to concentrate on a higher level of abstraction. The price for such freedom is performance. "Interpreter" is a priority term for Python since it stops to define data types and also because the short syntax of dynamic programming languages is better suited for interactive interfaces. There are no technical barriers to making an interactive Java interface, but trying to interactively typing static code (some console interpreter is meant here) will be tedious, so no one does.

In the Java world, all interactivity is hidden, because it runs programs in a language that can be compiled into native code in general and as a result will have high speed and efficient use of resources. Java bytecode can be executed using a Java virtual machine with performance comparable to programs compiled into native code. The Java Virtual Machine places Java in its own category:

Portable interpretable statically typed language

LLVM is closest to it, but LLVM operates on a different level:

portable interpretable assembly language

The term "bytecode" is used not only in Java and Python, but not all bytecode is created the same. Bytecode is only a general term for intermediate languages used by compilers / interpreters. Even the C compiler, for example gcc, uses an intermediate language (or several) for its needs. Java bytecode contains information about basic data types, whereas Python does not contain bytecode. In this sense, the Python virtual machine (and Bash, Perl, Ruby, etc.) is indeed much slower than the Java virtual machine, or even easier, she has more work. It is worth noting that the information contained in various bytecodes differs in the format represented:

LLVM: cpu registers
Java: basic data types
Python: custom data types

To give an analogy from the real world, you can imagine that LLVM works at the atomic level, Java at the molecular level, and Python works with matter. Since everything can be broken up into subatomic particles (machine operations), the most difficult job for the virtual machine is Python.

Interpreters / compilers for static languages do not have the same burden as dynamic interpreters / compilers. Programmers of static languages are forced to sacrifice something for performance. However, just as all non-deterministic functions are actually deterministic, so all dynamically-typed languages are actually statically-typed. The differences between these classes of languages should smooth out over time and then Python can be renamed to HAL 9000

Virtual machines of dynamic languages, such as Python, implement a more idealized logical machine and should not be carefully compared with real physical hardware. The Java virtual machine, on the contrary, is more similar to the classical C compiler, in addition to executing the machine code, it executes built-in subroutines. In Python, an integer integer is an object with a bunch of attributes and methods associated with it. In Java, an integer is a sequence of bits, usually 32 bits. This is not the most honest comparison. Integers in Python should be compared with the integer class in Java. The int data type in Java cannot be compared to anything in Python, because Python simply eliminates this level of abstraction, exactly like its bytecode **

Since all variables in Java are statically typed, it can be reasonably stated that an interpreter like Jython will be faster than a CPython interpreter. On the other hand, a Java virtual machine implemented in Python will almost certainly be slower. And don't count on Ruby, Perl, etc. will be better They were not designed for this. They were designed for "scripting" - what is called programming in dynamic languages.

Each operation performed in a virtual machine necessarily affects the real equipment. Virtual machines contain pre-configured routines, which are generally sufficient to execute any sequence of logical operations. A virtual machine may not define new machine instructions, but it does exactly execute its internal subroutines and various complex chains of subroutines. The Java Virtual Machine, as well as the Python Virtual Machine and all other general purpose virtual machines, are identical in the sense that you can program them to perform any logical sequence, but they differ in what tasks they undertake and which they leave on the programmer’s conscience. .

The moral of this story is that the information about the basic data types really helps the compiler / virtual machine.

Finally, to finally confuse everyone, imagine: a Python program is executed by an interpreter / virtual machine in Python that runs on a Java interpreter / virtual machine that runs on an LLVM in quemu that runs on an iphone.

via https://stackoverflow.com/a/1732383/3049150

Remarks:

* However, you still need to manually take care of copying complex data types. These do not include strings, numbers (int), bool - they are copied without problems, but all sorts of complex data structures are copied by reference, for their full copying (for example, nested dictionaries) you must use the deepcopy library.

** In general, you can achieve the creation of the usual type int for C using Cython. But this is only in CPython, as far as I know, and it is not part of the python-core infrastructure

Addition 2:

In Python3, they also made it possible to declare function arguments typed . As with all this, now you can live to read in the documentation . However, this is extremely rare, because it will not be ported in Python2, and because of compatibility, it is rarely used naturally. The official death date for Python2 is 2020.

The official position of the developers of Python - this is done for the work of static program analyzers in order to detect errors. But, it seems to me, also to make Python faster. Previously, all the requirements for input arguments were written in the docstring, there was no uniform format.

But after all, if you run a bytecode when starting the program, then this is not a completely interpretable language, since
@faoxis Yes, but only those modules that are imported in the process of running the script are compiled into byte-code.
The module itself that runs the user in bytecode is not compiled.
Also, there is an interactive mode of operation, in which the code is not compiled.
Here, the interactive mode really does not compile anything, everything is simply interpreted directly (under the hood, for sure, it simply does not save)
@faoxis: it is worth mentioning that many things in this answer are personal opinions of FeroxTL and other Python programmers may have different views (interpret the facts differently), for example, Cython is a language (Python add-on to interact with C more effectively), the byte code is not a mandatory part of Python, just like Java Python code can be compiled into machine code (using the JIT compiler during execution), etc.

Parsing Python mechanisms

1 answer 1

More articles: