I lied a little 3 years ago. The problem is solved in about 20 minutes.
I will describe only the algorithm itself, followed by a bunch of ugly code with almost no comments.
The actual, completely revised code with improvements and bugfixes can be found on GitHub:
https://github.com/ReinRaus/RegexCrossTool
So, the algorithm:
- Pass through all regular expressions and find all the specific characters in them. These are character classes
[az]
and single characters abc
. Ignore .
and [^az]
. The logic is simple: if a crossword puzzle has a single solution, then all the symbols of the solution are given by specific classes, not negation and not a full stop. The resulting list of characters is used to create a class later .
and denial. - Pass through all regular expressions and find in them character classes and single characters
r"\[\^?[az]+\]|[az]|\."
, there is no support for metacharacters \w\d
itp and characters other than az
, but it’s not hard to add if you really want to. In this crossword is not. Point 2 performs the function getEntitys( regex )
- We make the replacement of the found sequences with their screened representation with a wrapper in a non-preserving group. That is, instead of
[^df]
we will look for (?:[\[^d\-f\])
and character scrolls will appear in full brute force, not specific characters - We do a complete enumeration of character classes according to length.
checkEnt( entity, length )
. This means the expression [az]?x?.*
With a length of 3 will be represented by an array [ "[az]x.", "x..", "[az]..", "..." ]
These are all possible options for a given expression in character classes. - Optimization. Expression
#32
will be calculated for about 40 hours, so an optimization was written for it. Total time reduced to 15 minutes. It suits me, but during this time I began to write a second optimization, then commented out. Maybe someone is interested. - The result of the work of point 4 is saved to the hard disk It makes no sense to consider all this every time.
- Create a map board. If interested - Hexagonal board map
- Union We go through the results of work and create a list of characters that are possible for regular expression in specific positions. This is achieved by combining sets of
char2set( char )
derived from the conversion of character classes into sets - Intersection Pass through the map and find the intersection of sets for each position of the crossword puzzle field
- Filtering Filter the array of the result of paragraph 4 in accordance with the results of the intersection, remove those variants from the full iteration that do not correspond to regular expressions in the same cell of the crossword puzzle
- We do everything again starting from point 7, until filtering bears fruit. If the filtering did not make changes, then this algorithm did everything it could.
- We run a full search for regular expressions that contain backlinks, because these regular expressions were revealed inefficiently.
(.)\1*
will expand in ......
for example for length 6 - We leave only the appropriate values from the full search.
- To calm the soul a couple of times we make paragraphs 7-10.
- Display the result
Result:
nhpehasdiomomthfoxnxa xphmmommmmrhhmcxnmmcr xemcmccccmmmmmmhrxrcm iiihxlsoreoreoreorevc xcchhmxccrrrrhhhrrunc xdxexlerrddmmmmgcchhc c Total time: 408.3628809452057 sec.
sota.py
def dim1( k, n, d ): y = k x = abs( k - d + 1) + 2 * n return [ y, x ] def dim2( k, n, d ): t = k-d+1 y = n if t>0: y+= t x = abs( 1-d )+2*kn if t>0: x-= t return [ y, x ] def dim3( k, n, d ): t1 = k - d + 1 t2 = d - k - 1 y = n if t2>0: y+=t2 x = k + n if t1>0: x+=t1 return [ y, x ] def pushToRes( y, x, d, val, res, dimens ): ind = y*4*d+x if ( not ind in res.keys() ): res[ind] = {} res[ind][ dimens ] = val def pushToRes2( y, x, d, val, res, dimens ): global lens ind = y*4*d+x index = (dimens-1)*(2*d-1)+val[0] if dimens==3: val[1] = lens[val[0]]-val[1]-1 val = [ index, val[1] ] if ( not ind in res.keys() ): res[ind] = {} res[ind][ dimens ] = val res = {} d= 7 lens = list(range(d,2*d))+list(range(2*d-2,d-1,-1)) # d=3, lens = [3,4,5,4,3] for i in range( 0, 2*d-1 ): for j in range( lens[i] ): d1 = dim1( i, j ,d ) d2 = dim2( i, j ,d ) d3 = dim3( i, j ,d ) pushToRes2( d1[0], d1[1], d, [i,j], res, 1 ) pushToRes2( d2[0], d2[1], d, [i,j], res, 2 ) pushToRes2( d3[0], d3[1], d, [i,j], res, 3 ) maps = res #print( res ) print( len( res ) ) x = list(res.keys()) x.sort() res2 = [ res[i] for i in x ] #print( res2 ) #print( str( res2 ).replace( "{", "Array(").replace( ":", "=>").replace( "}", ")") )
resudoku.py
import sota, re, pickle, os, time reChars = r"\[\^?[az]+\]|[az]|\." reCharsC = re.compile( reChars, re.I ) def getEntitys( regex ): def repl( m ): return "(?:"+re.escape( m.group(0) )+")" res = reCharsC.findall( regex ) regex2 = reCharsC.sub( repl, regex ) regex2 = "^" + regex2 + "$" res = list( set( res ) ) return [ regex, re.compile( regex2, re.I), res ] def getAllRe(): return [ r".*H.*H.*", # 0 r"(DI|NS|TH|OM)*", # 1 r"F.*[AO].*[AO].*", # 2 r"(O|RHH|MM)*", # 3 r".*", # 4 r"C*MC(CCC|MM)*", # 5 r"[^C]*[^R]*III.*", # 6 r"(...?)\1*", # 7 r"([^X]|XCC)*", # 8 r"(RR|HHH)*.?", # 9 r"N.*XXX*E", # 10 r"R*D*M*", # 11 r".(C|HH)*", # 12 r"(ND|ET|IN)[^X]*", # 13 r"[CHMNOR]*I[CHMNOR]*", # 14 r"P+(..)\1.*", # 15 r"(E|CR|MN)*", # 16 r"([^MC]|MM|CC)*", # 17 r"[AM]*CM(RC)*R?", # 18 r".*", # 19 r".*PRR.*DDC.*", # 20 r"(HHX|[^HX])*", # 21 r"([^EMC]|EM)*", # 22 r".*OXR.*", # 23 r".*LR.*RL.*", # 24 r".*SE.*UE.*", # 25 r".*G.*V.*H.*", # 26 r"[CR]*", # 27 r".*XEXM*", # 28 r".*DD.*CCM.*", # 29 r".*XHCR.*X.*", # 30 r".*(.)(.)(.)(.)\4\3\2\1.*", # 31 r".*(IN|SE|HI)", # 32 r"[^C]*MMM[^C]*", # 33 r".*(.)C\1X\1.*", # 34 r"[CEIMU]*OH[AEMOR]*", # 35 r"(RX|[^R])*", # 36 r"[^M]*M[^M]*", # 37 r"(S|MM|HHH)*" # 38 ] def getFullABC(): result = set() for i in getAllRe(): for j in reCharsC.findall( i ): if not re.match( r"\[\^", j ) and not j == ".": result = result.union( char2set( j ) ) return result allLen = [7,8,9,10,11,12,13,12,11,10,9,8,7, 7,8,9,10,11,12,13,12,11,10,9,8,7, 7,8,9,10,11,12,13,12,11,10,9,8,7] def optimization( ent, length ): result = [] # оптимизация - альтернатива в конце строки одной длины result1 = [] reCh = r"(?:"+reChars+")+" re1 = r"\((?:"+reCh+r"\|)+"+reCh+r"\)$" opt1 = re.findall(re1, ent[0], re.I) if len( opt1 ) > 0: opt1 = opt1[0].replace( "(", "" ).replace( ")", "" ) parts = opt1.split( "|" ) count = list( set( [ len( re.findall( reChars, i, re.I ) ) for i in parts ] ) ) if len( count ) == 1: count = count[0] left = re.sub( re1, "", ent[0], flags=re.I ) result1 = checkEnt( getEntitys( left ), length-count ) for i in result1: for j in parts: result.append( i + j ) # несколько символов подряд (в строке нет скобок) ## result2 = [] ## if not re.search( r"[()]", ent[0] ): ## grp = re.findall( r"(?<![\[])[az]{2,}(?![\]*+?{])", ent[0], re.I ) ## if len( grp ) >0 : ## ent2 = getEntitys( ent[0].replace( grp[0], "#", 1 ) ) ## ent2[0].replace( "#", "(?:"+grp[0]+")" ) ## ent2[1] = re.compile( "^"+ent2[0]+"$", re.I ) ## ent2[2].append( grp[0] ) ## print( ent, "\n", ent2 ) if len( result ) > 0: return result return None def checkEnt( ent, length ): optim = optimization( ent, length ) if optim: return optim result = [] maxv = len( ent[2] ) if maxv == 1: result.append( ent[2][0]*length ) else: counter = [0]*(length+1) label = 0 print( ent[0] ) iterCounter = 1 while counter[length] == 0: text = "" for i in range( length ): text+= ent[2][ counter[i] ] if ent[1].match( text ): #print( text ) result.append( text ) counter[label]+= 1 while counter[label] == maxv: label+=1 counter[label]+= 1 while label>0: label-=1 counter[label] = 0 if iterCounter % 10000000 == 0: print( iterCounter ) iterCounter+=1 return result res22 = [] counter=0 reBack = [ i for i in range( len(getAllRe()) ) if re.search("\\\\\\d", getAllRe()[i] ) ] #reBack.reverse() print( reBack ) dump = "dump22.txt" if os.path.isfile( dump ): with open( dump, "rb" ) as rfile: res22 = pickle.load( rfile ) print( dump, "loaded." ) else: for i in getAllRe(): res22.append( checkEnt( getEntitys( i ), allLen[counter] ) ) counter+=1 with open( dump, "wb" ) as rfile: pickle.dump( res22, rfile ) #print( checkEnt( getEntitys( getAllRe()[8] ), 6 ) ) def char2set( char ): char=char.lower() result = None def convDiap( diap ): return re.sub( r"([az])\-([az])", repl, diap, flags=re.I ) def repl( m ): # dh -> defgh res = "" for i in range( ord( m.group(1) ), ord( m.group(2) )+1 ): res+= chr( i ) return res char = char.replace( ".", "[az]" ) char = convDiap( char ) if re.fullmatch( "[az]", char, re.I ): result = set( [char] ) elif re.fullmatch( r"\[[az]+\]", char, re.I ): result = set( char.replace( "[", "" ).replace( "]", "" ) ) elif re.fullmatch( r"\[\^[az]+\]", char, re.I ): result = set( char.replace( "[", "" ).replace( "]", "" ).replace( "^", "" ) ) result = fullABC - result return result def unionDiaps( diaps ): # перебираем все варианты и делаем сеты конкретных символов в конкретных позициях result = [None]*len(diaps) for i in range( len(diaps) ): # перебор наборов вариантов sets = [ set() ] * allLen[i] for j in diaps[i]: # конкретные варианты chars = reCharsC.findall( j ) for k in range( len(chars) ): sets[k] = sets[k].union( char2set( chars[k] ) ) result[i] = sets #print( diaps[i], result[i] ) return result def intersectMapping( diaps ): for i in sota.maps: res = None text = "" for j in sota.maps[i]: # пересечения в соответствии с картой iRe = sota.maps[i][j][0] iPos = sota.maps[i][j][1] text+= str( diaps[iRe][iPos] )+"\n" if res == None: res = diaps[iRe][iPos].copy() else: res = res & diaps[iRe][iPos] for j in sota.maps[i]: # записываем результат пересечений обратно iRe = sota.maps[i][j][0] iPos = sota.maps[i][j][1] if len( res ) ==0: print( diaps[iRe][iPos] ) diaps[iRe][iPos] = res.copy() text+="inter: "+str(res)+"\n" #print( text ) if len( res ) == 0: for j in sota.maps[i]: # записываем результат пересечений обратно iRe = sota.maps[i][j][0] iPos = sota.maps[i][j][1] print( "null set", iRe, iPos ) print() return diaps def filterDiaps( diaps, intersect ): changed = False for i in range( len(diaps) ): toDel = [] for k in range( len( diaps[i] ) ): ch = reCharsC.findall( diaps[i][k] ) mark = False for j in range( len( ch ) ): ls = "".join( list( intersect[i][j] ) ) if not re.search( ch[j], ls, re.I ): mark = True #print( k, diaps[i][k], ch[j], ls, mark ) if mark: toDel.append( k ) if len( toDel ) > 0: changed = True toDel.reverse() for k in toDel: del diaps[i][k] #print( diaps[i], intersect[i] ) return changed def checkReBack( intersect, diaps ): for i in reBack: text = "" counter = 0 sets = [] maxs = [] for j in intersect[i]: if len( j ) == 1: text+= list( j )[0] else: text+= "("+str(counter)+ ")" sets.append( list( j ) ) maxs.append( len( j ) ) counter+= 1 iters = 1 for j in maxs: iters*= j if iters>limitBack: continue # долго не значит, что поможет maxs.append( 0 ) length = len( sets ) counter = [0]*(length+2) label = 0 iterCounter = 1 result = [] while counter[length] == 0: text2 = text for j in range( length ): text2= text2.replace( "("+str(j)+")", sets[j][ counter[j] ], 1 ) if re.fullmatch( getAllRe()[i], text2, re.I ): result.append( text2 ) counter[label]+= 1 while counter[label] == maxs[label]: label+=1 counter[label]+= 1 while label>0: label-=1 counter[label] = 0 if iterCounter % 10000000 == 0: print( iterCounter ) iterCounter+=1 diaps[i] = result fullABC = getFullABC() print( fullABC ) t1 = time.time() limitBack = 1000000000000 # к сожалению так changed = True count = 0 countUn = 0 while changed or countUn < 5: union = unionDiaps( res22 ) intersect = intersectMapping( union ) changed = filterDiaps( res22, intersect ) if countUn == 2: checkReBack( intersect, res22 ) if not changed: countUn+=1 else: countUn = 0 count+= 1 print( count ) for y in range( sota.d*2-1 ): for x in range( sota.d * 4 ): if y*4*sota.d+x in sota.maps.keys(): val = sota.maps[ y*4*sota.d+x ][1] if len( intersect[val[0]][val[1]] ) == 1: print ( list( intersect[val[0]][val[1]])[0], end="" ) else: print( "*", end="" ) else: print( " ", end="" ) print() print( "\nTotal time:", time.time()-t1, "sec." )
A more detailed description of the algorithm