Help to create a regular expression to search for words only on a certain set of letters. Those. There is the word "lighter". As a result, only letters from this word and in the same quantity can be used in the searched words.

It is necessary to find all the words consisting of these letters. Those. "eye", "gas", "jackdaw", etc. In general, as in the game "Words from the word"

    2 answers 2

    If we are talking about rearranging letters, then:

    ^(?=.*Π·)(?=.*Π°.*Π°.*Π°)(?=.*ΠΆ)(?=.*ΠΈ)(?=.*Π³)(?=.*Π»)(?=.*ΠΊ)[Π·Π°ΠΆΠΈΠ³Π»ΠΊ]{9}$ 

    If we are just talking about a subset, then so:

     ^(?!.*Π·.*Π·)(?!.*Π°.*Π°.*Π°.*Π°)(?!.*ΠΆ.*ΠΆ)(?!.*ΠΈ.*ΠΈ)(?!.*Π³.*Π³)(?!.*Π».*Π»)(?!.*ΠΊ.*ΠΊ)[Π·Π°ΠΆΠΈΠ³Π»ΠΊ]+$ 
    • Not really. As I understand this line is looking for words with exactly 9 letters, but you need to find all the words consisting of these letters. Those. "eye", "gas", "jackdaw", etc. In general, as in the game "Words from the Word" - Stas P.
    • @StasP., The answer is supplemented. - Qwertiy ♦
    • @Qwertiy, and in which of the dialects of re should it work as required in the condition? - aleksandr barakin
    • @alexanderbarakin, in any? I checked on js. And in which it does not work (subject to the presence of a negative preview)? - Qwertiy ♦
    • @Qwertiy, from known (and available), it only works for me in pcre . Yes, you are half denied my statement. but only half: additional code is still required to convert the original character set (I proceed from my understanding of the requirements: the source word must be arbitrary). - aleksandr barakin

    the β€œin that quantity” condition (more precisely, as is clear from the comment to another answer, β€œin an amount not greater than the specified one”) cannot be fulfilled with the help of standards of regular expressions known to me. of course, if you use only the regular expression engine, without additional code (see below).

    if the condition about the quantity is omitted and if the words go one per line, then, for example, like this:

     $ echo -e 'Π³Π»Π°Π·\nΠ³Π°Π·\nΠΌΠΎΠ·Π³\nΠ³Π°Π»ΠΊΠ°' | grep '^[Π·Π°ΠΆΠΈΠ³Π°Π»ΠΊΠ°]\+$' Π³Π»Π°Π· Π³Π°Π· Π³Π°Π»ΠΊΠ° 

    By additional code, I mean:

    1. Convert the search string to a sorted list of letters with a quantity quantifier for letters that occur more than once. for example: convert a Π·Π°ΠΆΠΈΠ³Π°Π»ΠΊΠ° to ^Π°{1,3}?Π³?ΠΆ?Π·?ΠΈ?ΠΊ?Π»?$ (or slightly differently, only with quantity quantifiers and without quantifier ? ^Π°{0,3}Π³{0,1}ΠΆ{0,1}Π·{0,1}ΠΈ{0,1}ΠΊ{0,1}Π»{0,1}$ )
    2. convert an input word to a sorted list of letters. for example: ΠΌΠΎΠ·Π³ β†’ Π³Π·ΠΌΠΎ , Π³Π»Π°Π· β†’ Π°Π³Π·Π» , Π·Π°Π·Π° β†’ Π°Π°Π·Π· .

    if such transformations are performed, then the engine that understands bre (basic regular expressions) will do the job:

     $ echo -e 'Π³Π·ΠΌΠΎ\nΠ°Π³Π·Π»\nΠ°Π°Π·Π·' | grep '^Π°\{1,3\}\?Π³\?ΠΆ\?Π·\?ΠΈ\?ΠΊ\?Π»\?$' Π°Π³Π·Π» 

    An example of the implementation of the described transformations using posix-utilities:

    1. convert source word to regular expression bre:

       $ echo 'Π·Π°ΠΆΠΈΠ³Π°Π»ΠΊΠ°' | sed 's/./&\n/g;s/.$//' | sort | uniq -c | \ sed -r 's/\s*([0-9]+)\s*(.*)/\2\\{0,\1\\}/;1s/^/^/;$s/$/$/' | \ sed ':a;N;s/\n//;ta' ^Π°\{0,3\}Π³\{0,1\}ΠΆ\{0,1\}Π·\{0,1\}ΠΈ\{0,1\}ΠΊ\{0,1\}Π»\{0,1\}$ 
    2. sorting letters:

       $ echo 'Π³Π»Π°Π·' | sed 's/./&\n/g;s/.$//' | sort | sed ':a;N;s/\n//;ta' Π°Π³Π·Π» 

    update : in its answer, Qwertiy demonstrated that the second of the transformations described by me can be dispensed with if you use the standard pcre ( perl compatible regular expressions ), which has a preview function ( look-ahead ).

    in this case, the first conversion can be done by means of posix-utilities, for example, like this:

     $ w='Π·Π°ΠΆΠΈΠ³Π°Π»ΠΊΠ°'; echo $w | sed 's/./&\n/g;s/.$//' | sort | uniq -c | \ sed -r 's/^\s*([0-9]+)\s*(.)$/echo \\(?!\\(.*\2\\)\\{$((\1+1))\\}\\)/e;1s/^/^/;$s/$/['$w']+$/' | \ sed ':a;N;s/\n//;ta' ^(?!(.*Π°){4})(?!(.*Π³){2})(?!(.*ΠΆ){2})(?!(.*Π·){2})(?!(.*ΠΈ){2})(?!(.*ΠΊ){2})(?!(.*Π»){2})[Π·Π°ΠΆΠΈΠ³Π°Π»ΠΊΠ°]+$ 

    The resulting regular expression works correctly:

     $ echo -e 'ΠΌΠΎΠ·Π³\nΠ³Π»Π°Π·\nΠ·Π°Π·Π°' | grep -P '^(?!(.*Π°){4})(?!(.*Π³){2})(?!(.*ΠΆ){2})(?!(.*Π·){2})(?!(.*ΠΈ){2})(?!(.*ΠΊ){2})(?!(.*Π»){2})[Π·Π°ΠΆΠΈΠ³Π°Π»ΠΊΠ°]+$' Π³Π»Π°Π· 
    • Nah, does not take into account the number of letters. Π·Π°Π·Π° too, but only one in the source - VenZell
    • such a requirement cannot be satisfied with the help of regular expression engines known to me: additional software code will definitely be required. - aleksandr barakin
    • alexandr barakin, in a question unless there was a condition about Unix? - Sasha Chernykh
    • @ SashaBlack, and unix do with it? I showed an example of a regular expression for the standard bre (basic regular expression) . - aleksandr barakin
    • "with the help of the standards of regular expressions known to me it is impossible to fulfill" - disagree :) - Qwertiy ♦