There is a file with VBA functions and procedures. It is necessary to pull out their first lines using a regular expression.

my $s = 'Public Function SheetReMake(ByVal sheetName As String) As Worksheet \' создает новую таблицу, если такая есть, то ее удаляет'; my @arr = ($s =~ m/^((public|private)*\s*(function|sub)+\s+(\w+)\(.*?\)(.*?))+$/igm); print $#arr."\n"; for my $i(0..$#arr) { print $arr[$i]."\n"; } 

Instead, “Public”, “Private”, “Sub”, “Function”, etc., as well as the entire required strings, fall into the @arr array separately. What am I doing wrong?

  • one
    You have all the groups with capture. If the group is not needed as a result, place it after the opening bracket ?: Ru.wikipedia.org/wiki/… - Visman
  • Thank! Already easier, but now is not looking for all occurrences, until I figure out why. - klaus65sem
  • And now it is impossible to access data from groups - $ 1 ... $ 5. Now my @arr = ($ s = ~ / ^ ((?: public | private) * \ s * (?: function | sub) + \ s + (?: \ W +) \ (. *? \) ( ?:. *?)) + $ / igm); - klaus65sem
  • By the way, you didn’t write to the question whether you need data from their groups or not. because the proposed ?: all that does - it makes these brackets not captured - Mike
  • Mea culpa ... Probably, it is necessary first to get an array of strings without capturing substrings, then in a loop to parse it and work with it further. Here, at the beginning, the lines public or private can be omitted, if there is, then there is a space after them, then the obligatory word function or sub, then the space and the obligatory word - the name of the function and mandatory brackets (empty or with a list of arguments). If this is a function, then behind it is the type of the returned data and an optional comment, separated by a single quote. - klaus65sem

2 answers 2

The method of processing a row that you use selects each individual capture group into a separate element of the resulting array. As a result, in the cycle you see each element separately. If you need to work separately with the selected capture groups, you can use, for example, the regular expression traversal syntax using while:

 #!/usr/bin/perl my $s = 'Public Function SheetReMake(ByVal sheetName As String) As Worksheet \' создает новую таблицу, если такая есть, то ее удаляет'; while($s =~ /((public|private)*\s*(function|sub)+\s+(\w+)\(.*?\)(.*?))/igm) { print "Тип: ",$2," F/S:",$3," Имя:",$4," Весь текст:",$1,"\n"; } 

An example on ideone.com

  • Thank! This is more elegant than my thought with an intermediate array. - klaus65sem

And you can immediately get an array without using cycles. To do this, just grab the text we need (like lookbehind).

 @matched = $text =~ m/(?:private|public)(?:.*$)\n(.*$)/igm; 

Here we:

  1. Grab strings that contain private|public
  2. Ignore what we said ?:
  3. Ignore all text until end of line (?:.*$)
  4. We catch the line break \n
  5. Grab the next line (.*$)

i - ignore the difference in capital letters
g - do not stop at the first match
m - We change the meaning for ^$ , now they catch the beginning / end of lines in the middle of the transmitted text


In your example, essentially not enough ignoring. Those. something like this should be:

 /(?:public|private)*\s*(?:function|sub)+\s+(\w+)\(.*?\)(.*?)/igm 

Now the keywords will not be wordy

  • @ klaus65sem: This answer still does not solve your question? - Eugen Konkov pm