I am a student and as part of my training I have the task of searching for clones of the source code in a certain C project.

It is not necessary to build a tree for the entire project, it is assumed that there are some source fragments, the clones of which need to be found, i.e. in essence, I only need to build trees for some arbitrary source code fragments, and then compare them, given that some subtrees may differ.

I have already written in my time the search for clones using the parameterization of some lexemes in fragments, now we need to move on to a comparison of these arbitrary fragments, like trees, since This allows you to find more clones.

I myself googled, but decided that it is reasonable to ask advice from more experienced people. Are there any means (preferably free) that will allow you to build syntactic trees for arbitrary fragments of C source code and that can produce a result in at least some text format to standard output or to a file so that the program I write can read it ?

Or some libraries for C ++, with which I can get an idea of ​​the arbitrary fragments mentioned inside my program?

1 answer 1

As @PinkTux correctly suggests in the comments, you should not try to write the parser yourself: this is a big and difficult job that does not directly relate to your task.

To parse languages ​​into the syntax tree, lex and yacc utilities are traditionally used (their more modern versions are flex and bison ). You will have to learn their formats, but believe me, this will come in handy more than once in life. Flex and bison are free, developed and supported.

There are several ready-made grammars for the C language, the Internet seems to consider this pair to be the best: http://www.quut.com/c/ANSI-C-grammar-l-2011.html + http://www.quut.com /c/ANSI-C-grammar-y.html . Therefore, do not reinvent the wheel. Take the finished grammar, add the construction classes for the syntax tree, and you can proceed to your main task.

[You will have to add to rules like WHILE '(' expression ')' statement code for building a part of the tree, for example:

 WHILE '(' expression ')' statement { $$ = new while_statement($3, $5) } 

which means “assign the result of expression ( $$ ) to new while_statement with parameters $3 (third expression in list, expression ) and $5 (fifth expression in list, statement )”, but this is trivial.]

Dare! You have a wonderful task!