Regular expressions. Search for words by typing letters

Question

Help to create a regular expression to search for words only on a certain set of letters. Those. There is the word "lighter". As a result, only letters from this word and in the same quantity can be used in the searched words.

It is necessary to find all the words consisting of these letters. Those. "eye", "gas", "jackdaw", etc. In general, as in the game "Words from the word"

Answer 1 · 2016-05-10T00:04:15

If we are talking about rearranging letters, then:

^(?=.*з)(?=.*а.*а.*а)(?=.*ж)(?=.*и)(?=.*г)(?=.*л)(?=.*к)[зажиглк]{9}$

If we are just talking about a subset, then so:

 ^(?!.*з.*з)(?!.*а.*а.*а.*а)(?!.*ж.*ж)(?!.*и.*и)(?!.*г.*г)(?!.*л.*л)(?!.*к.*к)[зажиглк]+$

Qwertiy ♦

76.8k 17 golden marks 74 silver marks 203 bronze marks

Not really. As I understand this line is looking for words with exactly 9 letters, but you need to find all the words consisting of these letters. Those. "eye", "gas", "jackdaw", etc. In general, as in the game "Words from the Word" - Stas P.
@StasP., The answer is supplemented. - Qwertiy ♦
@Qwertiy, and in which of the dialects of re should it work as required in the condition? - aleksandr barakin
@alexanderbarakin, in any? I checked on js. And in which it does not work (subject to the presence of a negative preview)? - Qwertiy ♦
@Qwertiy, from known (and available), it only works for me in pcre . Yes, you are half denied my statement. but only half: additional code is still required to convert the original character set (I proceed from my understanding of the requirements: the source word must be arbitrary). - aleksandr barakin

|

Community spirit ♦ one · Answer 2 · 2016-05-10T12:55:03

the “in that quantity” condition (more precisely, as is clear from the comment to another answer, “in an amount not greater than the specified one”) cannot be fulfilled with the help of standards of regular expressions known to me. of course, if you use only the regular expression engine, without additional code (see below).

if the condition about the quantity is omitted and if the words go one per line, then, for example, like this:

 $ echo -e 'глаз\nгаз\nмозг\nгалка' | grep '^[зажигалка]\+$' глаз газ галка

By additional code, I mean:

Convert the search string to a sorted list of letters with a quantity quantifier for letters that occur more than once. for example: convert a зажигалка to ^а{1,3}?г?ж?з?и?к?л?$ (or slightly differently, only with quantity quantifiers and without quantifier ? ^а{0,3}г{0,1}ж{0,1}з{0,1}и{0,1}к{0,1}л{0,1}$ )
convert an input word to a sorted list of letters. for example: мозг → гзмо , глаз → агзл , заза → аазз .

if such transformations are performed, then the engine that understands bre (basic regular expressions) will do the job:

 $ echo -e 'гзмо\nагзл\nаазз' | grep '^а\{1,3\}\?г\?ж\?з\?и\?к\?л\?$' агзл

An example of the implementation of the described transformations using posix-utilities:

convert source word to regular expression bre:

 $ echo 'зажигалка' | sed 's/./&\n/g;s/.$//' | sort | uniq -c | \ sed -r 's/\s*([0-9]+)\s*(.*)/\2\\{0,\1\\}/;1s/^/^/;$s/$/$/' | \ sed ':a;N;s/\n//;ta' ^а\{0,3\}г\{0,1\}ж\{0,1\}з\{0,1\}и\{0,1\}к\{0,1\}л\{0,1\}$

sorting letters:

 $ echo 'глаз' | sed 's/./&\n/g;s/.$//' | sort | sed ':a;N;s/\n//;ta' агзл

update : in its answer, Qwertiy demonstrated that the second of the transformations described by me can be dispensed with if you use the standard pcre ( perl compatible regular expressions ), which has a preview function ( look-ahead ).

in this case, the first conversion can be done by means of posix-utilities, for example, like this:

 $ w='зажигалка'; echo $w | sed 's/./&\n/g;s/.$//' | sort | uniq -c | \ sed -r 's/^\s*([0-9]+)\s*(.)$/echo \\(?!\\(.*\2\\)\\{$((\1+1))\\}\\)/e;1s/^/^/;$s/$/['$w']+$/' | \ sed ':a;N;s/\n//;ta' ^(?!(.*а){4})(?!(.*г){2})(?!(.*ж){2})(?!(.*з){2})(?!(.*и){2})(?!(.*к){2})(?!(.*л){2})[зажигалка]+$

The resulting regular expression works correctly:

 $ echo -e 'мозг\nглаз\nзаза' | grep -P '^(?!(.*а){4})(?!(.*г){2})(?!(.*ж){2})(?!(.*з){2})(?!(.*и){2})(?!(.*к){2})(?!(.*л){2})[зажигалка]+$' глаз

such a requirement cannot be satisfied with the help of regular expression engines known to me: additional software code will definitely be required.
alexandr barakin, in a question unless there was a condition about Unix?
I showed an example of a regular expression for the standard bre (basic regular expression) .
"with the help of the standards of regular expressions known to me it is impossible to fulfill" - disagree :)

Regular expressions. Search for words by typing letters

2 answers 2

More articles: