I need to write a parser for a programming language with a fairly complex syntax. (Landmarks: Ruby, Smalltalk). It is necessary to disassemble operators with different priority.

How to make it easier and more convenient? Speed ​​is not so important.

UPD :

The list of syntax features that you will have to parse:

  • Optional brackets when calling the method.
  • Operators with priorities (Priorities are fixed in advance)
  • Indentation is used to denote the arguments of the operator transferred to a new line.

Example code to parse: here

UPD : The book of the dragon is not yet read, but on the way.

UPD : Another question is how bad it will be if my parser will only parse the code by loading the entire source into memory? For a programming language, is it fatal?

Closed due to the fact that off-topic participants Nick Volynkin , Athari , Nofate 13 Nov '15 at 22:03 .

It seems that this question does not correspond to the subject of the site. Those who voted to close it indicated the following reason:

  • " Questionnaires are forbidden on Stack Overflow in Russian . To get an answer, rephrase your question so that it can be given an unambiguously correct answer." - Athari, Nofate
If the question can be reformulated according to the rules set out in the certificate , edit it .

  • Description of the syntax of your language is? Code examples? - Alex Medveshchek

5 answers 5

The choice is worth it - either syntax-controlled translation (lex / yacc / bison / ...), or "pens". If you have time, try both options for a small subset of the language to understand the pros and cons.

Ruby parsing is done using a syntax-driven translation (judging by the source code). The theory of this is well described by Aho.

From the generators - feel ANTLR, I have not tried it myself, but the reviews are good.

And once again - you cannot do without good theoretical preparation, look at a couple of books, the same Aho, or Sverdlov, or Brainstorming.

  • +1 antlr - cy6erGn0m
  • Now I understand with ANTLR, it seems that it should. - Vladimir Gordeev

To successfully use Lex, Bison, etc. one must constantly use them (like any instrument itself). Otherwise, if after a year you need to change something, problems arise. Personally, I use C - 'pens', but naturally after at least a semi-formal description (for myself) of the syntax.

A remark about the indents, if you can do without them (for example, using the sign of the end or the continuation of the operator), then it is more convenient when programming. In general, at first glance, such a code is parsed in one pass.

    Well what can I say? For the compiler / interpreter project to be convenient for development and just for editing / editing, it’s enough to arrange a classical scheme: Lexer -> Interpreter. And how - in two words you can not tell. RTFM, please. =)

    Books:

    Dragon Book: Compilers: principles, technologies and tools.

    Niklaus Wirth: Building Compilers

      UPD: Another question is how bad it will be if my parser will only parse the code by loading the entire source into memory? For a programming language, is it fatal?

      This is not fatal, if you do not mind the memory. But there are no “for” to load everything into memory - after all, everything can be parsed by the stream.

      • It is much easier for me to load everything into memory and to make several passes than to disassemble it in a stream. - Vladimir Gordeev
      • The answer is highly dependent on the language. If it is more declarative than imperative, then uploading is inevitable in memory. If you can perform on the fly, it is better on the fly. A few passes are not good, make such data structures so that even in memory it will be in one pass. - andruxa

      Try Bison or YACC ( with examples )

      • Bison is a classic tool for this. But he is very old, maybe other, more convenient tools appeared for this? One look at Ruby-parser gets scary. - Vladimir Gordeev
      • hmm, windows is even older :). There are files for download last year - it means that the product has developed at least not so long ago - at least yapycoder
      • I mean fresh ideas, not a fresh version. - Vladimir Gordeev
      • I thought it was necessary to have a working and appropriate solution for the formal requirements, but it turns out that a “fresh” one is needed, it is not clear why, and it is not clear that it does not suit the “old” ones. - yapycoder
      • The solution with bison looks cumbersome. I hoped for the existence of more convenient and human means. - Vladimir Gordeev