I want to write a code translator in python. I'm going to broadcast php code in c ++. I see the code translator like this: there is a program (.py), to the input of which the file name (.php) is supplied. Then the program reads this file and simultaneously creates the same file, but with the extension .cpp. While reading, the first file is parsed, i.e. if the first file has:

$name = "Alex"; echo $name; 

then the following should be in the .cpp file:

 #include <iostream> int main() { std::string name = "Alex"; std::cout << name << std::endl; return 0; } 

Do I even see how the translator works?

1 answer 1

Not so simple. First you need to write a parser that will analyze the translated code (php in your case).

If you write to an expert at a PC, a computer or a PC, or a computer maker, a computer maker, the first is a com- pany, then the first is that in a major development of a computer that concerns a computer, which is the main source of a computer problem from a computer that is in the headquarters of a computer engineer. So how did you decide to use Python - pay attention to the spark.

The module provides an extraordinary knowledge of the tools for the de-

Here is an example of an integer math expression interpreter. Need to start with lexical analysis. The essence of lexical analysis in the analysis of the incoming text on the sequence of elementary components - tokens. It can be an operator (for example + or -), an identifier (the name of a variable, a method, a class, etc.) separator (comma, two-way ...) and others.

The class GenericScanner will help in this. First you need to inherit our scanner from this class:

 from spark import GenericScanner class Scanner(GenericScanner): def __init__(self): # Инициализировать нужно обязательно! GenericScanner.__init__(self) # Создаем список, в котором будут сохранятся токены self.tokens = [] 

Tokens will be saved as a simple container class. Here is his code:

 class Token: def __init__(self, type, value): """ type - тип токена (оператор, идентификатор и т.д.) value - его значение """ self.type = type self.value = value # Нужно реализовать еще пару методов для сравнения токенов # и для строкового представления def __eq__(self, other): return self.type == other def __ne__(self, other): return not self == other def __repr__(self): return '%s(%r)' %(self.type, self.value) 

Next, you need to create methods in which tokens will be processed. These methods must satisfy certain conditions - firstly, the name of the method must begin with "t_", and secondly, the method must take one argument - the token itself. And in the third - the first line of the method should be a line of documentation that contains a regular expression describing this token. Let's write a method for recognizing integers:

  def t_integer(self, token): r'\d+' # Только целые десятичные числа # Добавляем токен в список self.tokens.append(Token('INTEGER', int(token) 

Further methods for recognizing operators:

  def t_operator(self, token): r'[\+\-\*\/]' # только плюс, минус, умножение и деление self.tokens.append(Token('OPERATOR', token)) 

Now we need a method for handling spaces:

  def t_space(self, token): r'\s+' # Для любых символов пропуска # Пропускаем их pass 

And, finally, it is desirable to overload the t_default method - it is called if the encountered character is not described by any previous method:

  def t_default(self, token): r'[\s\S]+' # Для любых символов # Синтаксическая ошибка! raise SyntaxError, str(token) 

The scanner is ready. You can use it as follows:

scanner = Scanner () scanner.tokenize (data) tokens = scanner.tokens

Where data is the scanned text.

So, we have a scanner that processes the text, and outputs a stream of tokens (or an error in the case of a meeting of an unfamiliar character)

As you can see, everything is not so simple. I was just interested in this question before, I remembered about this article, I brought it almost entirely for review. Next, sort out the question yourself.