Hello. In the .cpp source, you need to tear out all #define , and since #define can take several lines, I can not build the correct regexp. At the moment, this option (does not find if more than 3 lines):

 #d.*?(?(?=\\\r?\n)(.*?\n.*?\n)|\n)+ 
  • it was separately possible to single out a single-line and multi-line pattern, but too lazy to write an expression. theoretically a multi-line pattern consists of the first line without the '\' sign and several parts that are responsible for the symbol of the line break and follow the line of define: (\\$^.*) - jmu
  • Are you sure that the task is fully formulated? About the comments are not forgotten? In general, in the general case, it is necessary to think about \ # ifdefs too. IMHO here regexps do not work. - avp 2:46 pm
  • The task is formulated fully. What you are saying has already been done. - Veikedo
  • one
    @alexlz, if regexps are so simple, then why do people constantly get confused about them (and ask questions)? If this problem is formulated fully, then it is solved in C in a maximum of half an hour (and without questions). - avp
  • one
    But I did not write about their simplicity. - alexlz

3 answers 3

I tried to take into account all the possible aspects of writing defines:

  1. Before # there can be any number of spaces, tabs and escaped line breaks.
  2. Similarly between # and define
    Screen about what happened:
    screen
    Regular expression itself:

     $RE = <<< HEREDOC ^ # Π½Π°Ρ‡Π°Π»ΠΎ строки ΠΈΠ»ΠΈ тСкста (?P<probel> (?: [ \\t] | # ΠΏΡ€ΠΎΠ±Π΅Π»Ρ‹, табуляция ΠΈΠ»ΠΈ \\\\ # ΠΎΠ±Ρ€Π°Ρ‚Π½Ρ‹ΠΉ слэш, Π·Π° Π½ΠΈΠΌ \\r? # Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ \r \\n # ΠΈ \n )*+ ) [#] (?P>probel) # Ρ…ΠΎΡ‚ΡŒ такая ссылка ΠΈ называСтся рСкурсивной, рСкурсии фактичСски Π½Π΅Ρ‚ define(?=[^a-z0-9_]|\$) # define справа ΠΎΡ‚ ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ Π½Π΅ символ ΠΈΠ»ΠΈ ΠΊΠΎΠ½Π΅Ρ† строки # Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊΠ° Ρ‚Ρ€Π΅Π±ΡƒΠ΅Ρ‚ уточнСния (?: \\\\[^\\r\\n] | # Ρ‡Ρ‚ΠΎ-Ρ‚ΠΎ экранированноС, Π½ΠΎ Π½Π΅ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Ρ‹ строк, ΠΈΠ»ΠΈ \\\\\\r?\\n | # экранированный ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄ строки, ΠΈΠ»ΠΈ [^\\r\\n] # любой символ ΠΊΡ€ΠΎΠΌΠ΅ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Π° строки )*+ \$ # ΠΊΠΎΠ½Π΅Ρ† строки ΠΈΠ»ΠΈ тСкста HEREDOC; echo preg_replace("/$RE/xum", "<span style='color:white;background-color:blue'>$0</span>", $text); 

A live example on IDEone .
I will be glad to comment. I by the way do not know what could be to the right of define. I assume that this is anything but [a-z0-9_] .

  • Do not forget about the comments :-) ideone.com/wGjcFY - VladD
  • @VladD, if you take into account the comments and the content of the lines, then this is almost a complete parser with ++ will work :) So I’ll probably skip them. - ReinRaus
  • Well :-) You have a comment parser somewhere, I remember. (Here's another: < ideone.com/4USVwo>. ) - VladD
  • @VladD, here: ideone.com/n9jb4k Search for defines based on strings, single-line and multi-line comments. char seems to have no influence whatsoever, so they did not. You again made me engage in madness :) - ReinRaus
  • one
    In fact, @ReinRaus, would have published a series of lectures on programming with the help of regulars on Researches . And there, you see, and the rest would write the implementation of more or less complete examples in different languages. Could a useful topic for all, really in the spirit of the knowledge base, turn out. - avp

This question was answered on the stack .

For those who do not like to follow the links:

 '(?m)^#define (?:.*\\\r?\n)*.*$' 

I did not check it myself, they write what should work for constructions like

 #define max(a,b) \ ({ typeof (a) _a = (a); \ typeof (b) _b = (b); \ _a > _b ? _a : _b; }) 
  • 3
    Only here in the program before the '#' and between the '#' and define there can be spaces, tabs and screened NL. For example such a garbage \ # \ \ define \ xaxa \ 22 \ main () {printf ("% d \ n", xaxa); } compiles and works. - avp
  • Well, the test for spaces is not a problem to add. Grammar will remain regular. - northerner
  • one
    Maybe. And the cited regexp #include won't throw away at the same time? I don’t trust programs with more or less complex logic tied to regexps at all. In most cases, they are with errors that are rarely manifested and therefore never debugged. - avp
  • No, it does not work. It works only for the specified constructions (and only after adding? Before $) - Veikedo
  • one
    In general, this case is not solved with the help of regex, for recursion. Entered by simple gluing of multiline defines. avp, the task, initially, was completely different. - Veikedo

Nothing is impossible.
Atomic grouping, or Not a step back!

  • Strong. I especially liked the comment: Once upon a time of insomnia, a textbook on quantum physics was offered, but now it suffices to restrict ourselves to an evening analysis of the incoming SMS message using regexps. Sleep soundly, the title of a pervert and a chair in your hand will certainly be yours. - avp