Regular expression for selecting alphabetic sequences without gaps

Question

You need to write a command for the terminal using grep or sed. It should only output matched chunks from a text file (no difference in a line or in a column). Perl cannot be used.

Now there is a regular degeneration

a?\s*b?\s*c?\s*d?\s*e?\s*f?\s*g?\s*h?\s*i?\s*j?\s*k?\s*l?\s*m?\s*n?\s*o?\s*p?\s*q?\s*r?\s*s?\s*t?\s*u?\s*v?\s*w?\s*x?\s*y?\s*z?\s*

Strangely enough, such sequences as, for example, "ace" or "bpxz" fall under it. How to make the expression take into account only sequences without missing letters, such as "abcd", "opqr", "xy"?

UPD : forgot to add that spaces are ignored (for this and use \ s *). A regular must in any place of the text find an alphabetic sequence. For example, from the phrase "roll call", there should be a "class" and "dezh" (in Russian it was easier to invent an example).

And what language do you write and what can it be done by other means.
A regular eerie comes out, by other means perhaps easier ...
@Daniil And do you have perl on this machine (it is on 90% of machines with grep)?
if in the input file the string "roll call" that the program should issue at the output.
I understand that the matching pieces of the line, only them or the entire line.
@Mike You need to write a command for the terminal using grep or sed.
It should only output matched chunks from a text file (no difference in a line or in a column).
In principle, the solution that was proposed below with a substring is appropriate, but I cannot implement it.

Accepted Answer · 2016-10-17T15:13:37

You unfortunately did not indicate which dialect of regular expressions can be used and what is it for. Perhaps there are simpler solutions based on the special features of regular expressions or simpler means without the use of regulars.

For a PCRE compatible dialect, a similar expression is obtained (before the letter d, continue by analogy, put gaps in taste):

 (?:a(?=b))?(?:b(?=c))?(?:c(?=d))?(?:d(?=e))?

Test on ragex101.com

From the "Special features" of regular expressions, you can, for example, in the perl language check any characters in a row to do this:

 echo "abpade fg xyz" | perl -npe 's/.*?((?:([az])\s*(?=(??{chr(ord($2)+1)})))+.)/$1\n/g' Результат: ab de fg xyz

Perl can be used instead of grep on most unix systems by writing the required command as a single line.

UPD For the command line, using only grep and sed, a short version:

 echo "a bcefgkmoxyz" |\ grep -Po `echo -n 'bcdefghijklmnopqrstuvwxyz' |\ sed 's/./\0\0/g;s/^/a/;s/\(.\)\(.\)/\\\\s*(?:\1(?=\\\\s*\2))?/g;s/.$/./'` |\ sed -n '/../p' Результат: a bc efg xyz

The command is divided into several lines for the convenience of viewing, you can in one line by removing the \ . I was too lazy to write a long regular session, so the result of executing (in reverse apostrophes) the command echo | sed echo | sed creates the necessary expression on the move from the letters of the alphabet. Unfortunately, the ideal expression did not work out and grep produces individual characters as well, the last line sed -n '/../p' used to suppress them.

The grep parameter generated by commands from the alphabet looks like this:

 \\s*(?:a(?=\\s*b))?\\s*(?:b(?=\\s*c))?\\s*(?:c(?=\\s*d))?\\s*(?:d(?=\\s*e))?\\s*(?:e(?=\\s*f))?\\s*(?:f(?=\\s*g))?\\s*(?:g(?=\\s*h))?\\s*(?:h(?=\\s*i))?\\s*(?:i(?=\\s*j))?\\s*(?:j(?=\\s*k))?\\s*(?:k(?=\\s*l))?\\s*(?:l(?=\\s*m))?\\s*(?:m(?=\\s*n))?\\s*(?:n(?=\\s*o))?\\s*(?:o(?=\\s*p))?\\s*(?:p(?=\\s*q))?\\s*(?:q(?=\\s*r))?\\s*(?:r(?=\\s*s))?\\s*(?:s(?=\\s*t))?\\s*(?:t(?=\\s*u))?\\s*(?:u(?=\\s*v))?\\s*(?:v(?=\\s*w))?\\s*(?:w(?=\\s*x))?\\s*(?:x(?=\\s*y))?\\s*(?:y(?=\\s*z))?.

learp learp 196 ten · Answer 2 · 2016-10-17T14:43:47

In general, you could make the string "abcdef ... xyz" and look for your string as a substring in this string. Is regExp required?

Comments are not intended for extended discussion; conversation moved to chat . - Nick Volynkin ♦

Qwertiy ♦ Qwertiy 76k 17 73 199 · Answer 3 · 2016-10-17T15:15:35

Each letter must be followed by the following or the end of the line:

 a?\s*(?=b|$) b?\s*(?=c|$) ... y?\s*(?=z|$) z?\s*$

T. o. if there is a letter, then it is associated with the next one and the pass is not allowed.
The concatenation of such regulars will give the desired expression. Just add a cover to the beginning.

Answer 4 · 2017-01-02T17:21:46

Another option, also limited from a to d, but exciting all the characters. In one of the proposed options, the last character of the sequence is not captured.

 (?(?=a)a(?=b))(?(?=b)b(?=c))(?(?=c)c(?=d))(?(?=d)d(?=e))

https://regex101.com/r/0W74IZ/2

Elegant, but low-performance solution with modification of the processed text:

 ([az]{2,})(?=.*\1[az]*$)

The meaning is very simple: before you start text processing, you need to add the entire sequence to its end:

 abcdefghijklmnopqrstuvwxyz

Thus, there is no need to make a huge regular expression, it’s enough to write something like in the code:

 preg_match_all( $re, $text.$allLiteralOrderer, $result)

https://regex101.com/r/0W74IZ/3

Regular expression for selecting alphabetic sequences without gaps

4 answers 4

More articles: