The question is simple, but I can’t find the answer. I can’t figure out how to parse data with regular Perl expressions like this:

start ..... load... starting... .... start ..... load... starting... .... 

There is a start keyword to bind the start of the search, but there is no ending. There is only a duplicate data structure, starting always the same. My attempts to write a parser lead to the fact that at least the last structure disappears (does not get into the parsing):

 (start\s).+?(\1) (?:(start\s)).*?(?1) 

I can't write a regular season that will find structures start ..... load ... starting ... .... Ps off topic, but interesting too. [^a] excludes the symbol "a", and is it possible to exclude the phrase "no"?

  • Add a language, sample data and expected result. - user207618
  • Explain exactly what you want to get at the exit. And what have the word "Audut", which is not in the sample data. - PinkTux
  • If there is a split in your language, then do it using the expression /start\s/ and get an array of text structures - Mike
  • A (letter) no can be deleted (?!no) - Mike
  • There is a split, but it belongs to the programming language. I can write line-by-line analysis with split and without regularizers, but I would like to study decular expressions. - Reiko Reiko

3 answers 3

 /(start\s).*?(?=\1|$)/gs 

Test at regex101.com

But it is better to use the split function, if you write on perl (as indicated in the question), then it looks like this @arr=split(/start\s/,$data); Received an array of structures @arr , with which we work further as it is convenient for us.

A combination of several characters can be excluded (?!no)

     let content = document.querySelector('div').innerHTML; console.info(content.split(/\Wstart\W/).filter(e => e)); 
     <div> start ..... load... starting... .... start ..... load... starting... .... </div> 

      No regexps are needed here. More precisely, they are needed, but as a condition for split (and for further grinding):

       #!/usr/bin/env perl use Modern::Perl; use DDP; my $text = <<END; start ..... load... starting... .... start ..... load... starting... .... END my @rc = split( /^start\s+/m, $text ); # в map удаляем начальные и конечные пробелы из элементов # при помощи grep удаляем пустые элементы массива @rc = grep { $_ } map { $_ =~ s/^\s+|\s+$//s; $_ } @rc; p @rc; 

      Conclusion:

       [ [0] "..... load... starting... ....", [1] "..... load... starting... ...." ]