Java source code consists of a collection of spaces, identifiers, literals, comments, operators, delimiters, and keywords.
What happens in the compiler with each of the selected concepts? Is something sifted out or somehow modified?
Java source code consists of a collection of spaces, identifiers, literals, comments, operators, delimiters, and keywords.
What happens in the compiler with each of the selected concepts? Is something sifted out or somehow modified?
The usual practice when writing a compiler is to divide it into parts. Traditionally, the first part is lexical analysis, dividing the source text into lexemes. This means that the code is read as a sequence of characters, and is represented as a sequence of tokens .
The token consists of the token type and value (packed in one class).
In this case, usually spaces (not part of character / string literals) are discarded, identifiers are turned into an “Identifier” type token with a value equal to the string with the identifier name. Literals also turn into tokens. Comments stage of lexical analysis usually do not pass and are simply discarded. Separators, like parentheses and punctuation marks, form each own type of token. Well, for keywords, too, they are usually distinguished by a separate type of token.
Example:
Source text
public class Example { // пример public static void main(String[] args) { System.out.println(/* этот текст будет напечатан*/"hello world"); } } produces the following sequence of lexical tokens:
[public-keyword] [class-keyword] [ident "Example"] [separator-left-brace] [public-keyword] [static-keyword] [void-keyword] [ident "main"] [separator-left-paren] [ident "String"] [separator-left-brack] [separator-right-brack] [ident "args"] [separator-right-paren] [separator-left-brace] [ident "System"] [separator-dot] [ident "out"] [separator-dot] [ident "println"] [separator-left-paren] [string-literal "hello world"] [separator-right-paren] [separator-semicolon] [separator-right-brace] [separator-right-brace] Further compilation phases will break it down into definitions of classes, functions, and operations, check for matching names, tie names to objects, check for meaningfulness, optimize and compile into bytecode.
Lexical analysis is the easiest compilation phase.
Yes, it is theoretically possible (and sometimes necessary ) to build compilers in which lexical analysis is essentially combined with the subsequent compilation phases. In principle, nothing forces the authors of the compiler to single out a separate phase of lexical analysis, but it is still common practice.
tokenizer class). - VladDSource: https://ru.stackoverflow.com/questions/617492/
All Articles