Parsing text with perl regex

Question

The question is simple, but I can’t find the answer. I can’t figure out how to parse data with regular Perl expressions like this:

start ..... load... starting... .... start ..... load... starting... ....

There is a start keyword to bind the start of the search, but there is no ending. There is only a duplicate data structure, starting always the same. My attempts to write a parser lead to the fact that at least the last structure disappears (does not get into the parsing):

 (start\s).+?(\1) (?:(start\s)).*?(?1)

I can't write a regular season that will find structures start ..... load ... starting ... .... Ps off topic, but interesting too. [^a] excludes the symbol "a", and is it possible to exclude the phrase "no"?

And what have the word "Audut", which is not in the sample data.
If there is a split in your language, then do it using the expression /start\s/ and get an array of text structures
There is a split, but it belongs to the programming language.
I can write line-by-line analysis with split and without regularizers, but I would like to study decular expressions.

Accepted Answer · 2016-10-21T10:40:46

 /(start\s).*?(?=\1|$)/gs

Test at regex101.com

But it is better to use the split function, if you write on perl (as indicated in the question), then it looks like this @arr=split(/start\s/,$data); Received an array of structures @arr , with which we work further as it is convenient for us.

A combination of several characters can be excluded (?!no)

user207618 · Answer 2 · 2016-10-21T10:44:03

 let content = document.querySelector('div').innerHTML; console.info(content.split(/\Wstart\W/).filter(e => e));

 <div> start ..... load... starting... .... start ..... load... starting... .... </div>

Answer 3 · 2016-10-21T10:48:17

No regexps are needed here. More precisely, they are needed, but as a condition for split (and for further grinding):

 #!/usr/bin/env perl use Modern::Perl; use DDP; my $text = <<END; start ..... load... starting... .... start ..... load... starting... .... END my @rc = split( /^start\s+/m, $text ); # в map удаляем начальные и конечные пробелы из элементов # при помощи grep удаляем пустые элементы массива @rc = grep { $_ } map { $_ =~ s/^\s+|\s+$//s; $_ } @rc; p @rc;

Conclusion:

 [ [0] "..... load... starting... ....", [1] "..... load... starting... ...." ]

Parsing text with perl regex

3 answers 3

More articles: