Good day!

introductory part

Stumbled upon a puzzle on the Internet. Presents a table in the form of a hexagon. Each line is defined by a regular expression. A regular expression determines what data can be contained in the specified string. He broke his whole brain, solving this puzzle, there was no solution at all. The puzzle is built on the principle of Japanese crosswords. Everything would be fine, but the table is a hexagon! ...

Psihanuv - decided to write a program that brute-force will find those letters that should be in the cells.

proper question

How to get rows from a hexagonal table? Well, how best to represent it in an array?

Link to the puzzle http://habrahabr.ru

Crossword itself

Thanks in advance for your advice.

Zzyzh solution is not requested. I want to solve it myself :-)

UPD:

I found a solution for myself in terms of array formation. In essence, this will be the following table:

---3-3-3--- --3-2-2-3-- -3-2-1-2-3- --3-2-2-3-- ---3-3-3--- 

Where the "-" is an empty value ... It remains the case for small, to write an algorithm to bypass the table on the sides of the hexagon. But I think this is already a matter of technology, since the idea of ​​forming a table was invented ...

Maybe someone has a better idea for data storage, so that it is convenient to analyze them?

  • Well, for everyone you can. For example, enter coordinates along two axes, then the third will be a diagonal. - VladD
  • Does the hint determine the entire string, or only that such a match is found in the string? - ReinRaus
  • one
    It would be interesting to look at the software solution that you, as I understand, want to do, because in fact you need to write a function that, having received a regular expression as input, returns a string of clearly defined characters, that is. function ("(kx) *. *", "?? z ????"); will return: "kxz ????" Unless of course you want to make a brute force, which is boring in implementation and debt in execution. ---------------- Read the question carefully - your grandchildren will decide more manually than you wait for the result of the brute force. - ReinRaus
  • What is the difficulty to sort through 3302 combinations programmatically? And many of the combinations will be cut off half way? - pincher1519
  • And where does this figure come from? The total number of variants is 28 ^ 127, approximately 6 * 10 ^ 183. Of course, yes, some options can be cut off, if there is a discrepancy to the expression, this will reduce the number of options, but not as much as we would like. - ReinRaus

3 answers 3

I lied a little 3 years ago. The problem is solved in about 20 minutes.
I will describe only the algorithm itself, followed by a bunch of ugly code with almost no comments.
The actual, completely revised code with improvements and bugfixes can be found on GitHub:
https://github.com/ReinRaus/RegexCrossTool

So, the algorithm:

  1. Pass through all regular expressions and find all the specific characters in them. These are character classes [az] and single characters abc . Ignore . and [^az] . The logic is simple: if a crossword puzzle has a single solution, then all the symbols of the solution are given by specific classes, not negation and not a full stop. The resulting list of characters is used to create a class later . and denial.
  2. Pass through all regular expressions and find in them character classes and single characters r"\[\^?[az]+\]|[az]|\." , there is no support for metacharacters \w\d itp and characters other than az , but it’s not hard to add if you really want to. In this crossword is not. Point 2 performs the function getEntitys( regex )
  3. We make the replacement of the found sequences with their screened representation with a wrapper in a non-preserving group. That is, instead of [^df] we will look for (?:[\[^d\-f\]) and character scrolls will appear in full brute force, not specific characters
  4. We do a complete enumeration of character classes according to length. checkEnt( entity, length ) . This means the expression [az]?x?.* With a length of 3 will be represented by an array [ "[az]x.", "x..", "[az]..", "..." ] These are all possible options for a given expression in character classes.
  5. Optimization. Expression #32 will be calculated for about 40 hours, so an optimization was written for it. Total time reduced to 15 minutes. It suits me, but during this time I began to write a second optimization, then commented out. Maybe someone is interested.
  6. The result of the work of point 4 is saved to the hard disk It makes no sense to consider all this every time.
  7. Create a map board. If interested - Hexagonal board map
  8. Union We go through the results of work and create a list of characters that are possible for regular expression in specific positions. This is achieved by combining sets of char2set( char ) derived from the conversion of character classes into sets
  9. Intersection Pass through the map and find the intersection of sets for each position of the crossword puzzle field
  10. Filtering Filter the array of the result of paragraph 4 in accordance with the results of the intersection, remove those variants from the full iteration that do not correspond to regular expressions in the same cell of the crossword puzzle
  11. We do everything again starting from point 7, until filtering bears fruit. If the filtering did not make changes, then this algorithm did everything it could.
  12. We run a full search for regular expressions that contain backlinks, because these regular expressions were revealed inefficiently. (.)\1* will expand in ...... for example for length 6
  13. We leave only the appropriate values ​​from the full search.
  14. To calm the soul a couple of times we make paragraphs 7-10.
  15. Display the result

Result:

  nhpehasdiomomthfoxnxa xphmmommmmrhhmcxnmmcr xemcmccccmmmmmmhrxrcm iiihxlsoreoreoreorevc xcchhmxccrrrrhhhrrunc xdxexlerrddmmmmgcchhc c Total time: 408.3628809452057 sec. 

sota.py

 def dim1( k, n, d ): y = k x = abs( k - d + 1) + 2 * n return [ y, x ] def dim2( k, n, d ): t = k-d+1 y = n if t>0: y+= t x = abs( 1-d )+2*kn if t>0: x-= t return [ y, x ] def dim3( k, n, d ): t1 = k - d + 1 t2 = d - k - 1 y = n if t2>0: y+=t2 x = k + n if t1>0: x+=t1 return [ y, x ] def pushToRes( y, x, d, val, res, dimens ): ind = y*4*d+x if ( not ind in res.keys() ): res[ind] = {} res[ind][ dimens ] = val def pushToRes2( y, x, d, val, res, dimens ): global lens ind = y*4*d+x index = (dimens-1)*(2*d-1)+val[0] if dimens==3: val[1] = lens[val[0]]-val[1]-1 val = [ index, val[1] ] if ( not ind in res.keys() ): res[ind] = {} res[ind][ dimens ] = val res = {} d= 7 lens = list(range(d,2*d))+list(range(2*d-2,d-1,-1)) # d=3, lens = [3,4,5,4,3] for i in range( 0, 2*d-1 ): for j in range( lens[i] ): d1 = dim1( i, j ,d ) d2 = dim2( i, j ,d ) d3 = dim3( i, j ,d ) pushToRes2( d1[0], d1[1], d, [i,j], res, 1 ) pushToRes2( d2[0], d2[1], d, [i,j], res, 2 ) pushToRes2( d3[0], d3[1], d, [i,j], res, 3 ) maps = res #print( res ) print( len( res ) ) x = list(res.keys()) x.sort() res2 = [ res[i] for i in x ] #print( res2 ) #print( str( res2 ).replace( "{", "Array(").replace( ":", "=>").replace( "}", ")") ) 

resudoku.py

 import sota, re, pickle, os, time reChars = r"\[\^?[az]+\]|[az]|\." reCharsC = re.compile( reChars, re.I ) def getEntitys( regex ): def repl( m ): return "(?:"+re.escape( m.group(0) )+")" res = reCharsC.findall( regex ) regex2 = reCharsC.sub( repl, regex ) regex2 = "^" + regex2 + "$" res = list( set( res ) ) return [ regex, re.compile( regex2, re.I), res ] def getAllRe(): return [ r".*H.*H.*", # 0 r"(DI|NS|TH|OM)*", # 1 r"F.*[AO].*[AO].*", # 2 r"(O|RHH|MM)*", # 3 r".*", # 4 r"C*MC(CCC|MM)*", # 5 r"[^C]*[^R]*III.*", # 6 r"(...?)\1*", # 7 r"([^X]|XCC)*", # 8 r"(RR|HHH)*.?", # 9 r"N.*XXX*E", # 10 r"R*D*M*", # 11 r".(C|HH)*", # 12 r"(ND|ET|IN)[^X]*", # 13 r"[CHMNOR]*I[CHMNOR]*", # 14 r"P+(..)\1.*", # 15 r"(E|CR|MN)*", # 16 r"([^MC]|MM|CC)*", # 17 r"[AM]*CM(RC)*R?", # 18 r".*", # 19 r".*PRR.*DDC.*", # 20 r"(HHX|[^HX])*", # 21 r"([^EMC]|EM)*", # 22 r".*OXR.*", # 23 r".*LR.*RL.*", # 24 r".*SE.*UE.*", # 25 r".*G.*V.*H.*", # 26 r"[CR]*", # 27 r".*XEXM*", # 28 r".*DD.*CCM.*", # 29 r".*XHCR.*X.*", # 30 r".*(.)(.)(.)(.)\4\3\2\1.*", # 31 r".*(IN|SE|HI)", # 32 r"[^C]*MMM[^C]*", # 33 r".*(.)C\1X\1.*", # 34 r"[CEIMU]*OH[AEMOR]*", # 35 r"(RX|[^R])*", # 36 r"[^M]*M[^M]*", # 37 r"(S|MM|HHH)*" # 38 ] def getFullABC(): result = set() for i in getAllRe(): for j in reCharsC.findall( i ): if not re.match( r"\[\^", j ) and not j == ".": result = result.union( char2set( j ) ) return result allLen = [7,8,9,10,11,12,13,12,11,10,9,8,7, 7,8,9,10,11,12,13,12,11,10,9,8,7, 7,8,9,10,11,12,13,12,11,10,9,8,7] def optimization( ent, length ): result = [] # оптимизация - альтернатива в конце строки одной длины result1 = [] reCh = r"(?:"+reChars+")+" re1 = r"\((?:"+reCh+r"\|)+"+reCh+r"\)$" opt1 = re.findall(re1, ent[0], re.I) if len( opt1 ) > 0: opt1 = opt1[0].replace( "(", "" ).replace( ")", "" ) parts = opt1.split( "|" ) count = list( set( [ len( re.findall( reChars, i, re.I ) ) for i in parts ] ) ) if len( count ) == 1: count = count[0] left = re.sub( re1, "", ent[0], flags=re.I ) result1 = checkEnt( getEntitys( left ), length-count ) for i in result1: for j in parts: result.append( i + j ) # несколько символов подряд (в строке нет скобок) ## result2 = [] ## if not re.search( r"[()]", ent[0] ): ## grp = re.findall( r"(?<![\[])[az]{2,}(?![\]*+?{])", ent[0], re.I ) ## if len( grp ) >0 : ## ent2 = getEntitys( ent[0].replace( grp[0], "#", 1 ) ) ## ent2[0].replace( "#", "(?:"+grp[0]+")" ) ## ent2[1] = re.compile( "^"+ent2[0]+"$", re.I ) ## ent2[2].append( grp[0] ) ## print( ent, "\n", ent2 ) if len( result ) > 0: return result return None def checkEnt( ent, length ): optim = optimization( ent, length ) if optim: return optim result = [] maxv = len( ent[2] ) if maxv == 1: result.append( ent[2][0]*length ) else: counter = [0]*(length+1) label = 0 print( ent[0] ) iterCounter = 1 while counter[length] == 0: text = "" for i in range( length ): text+= ent[2][ counter[i] ] if ent[1].match( text ): #print( text ) result.append( text ) counter[label]+= 1 while counter[label] == maxv: label+=1 counter[label]+= 1 while label>0: label-=1 counter[label] = 0 if iterCounter % 10000000 == 0: print( iterCounter ) iterCounter+=1 return result res22 = [] counter=0 reBack = [ i for i in range( len(getAllRe()) ) if re.search("\\\\\\d", getAllRe()[i] ) ] #reBack.reverse() print( reBack ) dump = "dump22.txt" if os.path.isfile( dump ): with open( dump, "rb" ) as rfile: res22 = pickle.load( rfile ) print( dump, "loaded." ) else: for i in getAllRe(): res22.append( checkEnt( getEntitys( i ), allLen[counter] ) ) counter+=1 with open( dump, "wb" ) as rfile: pickle.dump( res22, rfile ) #print( checkEnt( getEntitys( getAllRe()[8] ), 6 ) ) def char2set( char ): char=char.lower() result = None def convDiap( diap ): return re.sub( r"([az])\-([az])", repl, diap, flags=re.I ) def repl( m ): # dh -> defgh res = "" for i in range( ord( m.group(1) ), ord( m.group(2) )+1 ): res+= chr( i ) return res char = char.replace( ".", "[az]" ) char = convDiap( char ) if re.fullmatch( "[az]", char, re.I ): result = set( [char] ) elif re.fullmatch( r"\[[az]+\]", char, re.I ): result = set( char.replace( "[", "" ).replace( "]", "" ) ) elif re.fullmatch( r"\[\^[az]+\]", char, re.I ): result = set( char.replace( "[", "" ).replace( "]", "" ).replace( "^", "" ) ) result = fullABC - result return result def unionDiaps( diaps ): # перебираем все варианты и делаем сеты конкретных символов в конкретных позициях result = [None]*len(diaps) for i in range( len(diaps) ): # перебор наборов вариантов sets = [ set() ] * allLen[i] for j in diaps[i]: # конкретные варианты chars = reCharsC.findall( j ) for k in range( len(chars) ): sets[k] = sets[k].union( char2set( chars[k] ) ) result[i] = sets #print( diaps[i], result[i] ) return result def intersectMapping( diaps ): for i in sota.maps: res = None text = "" for j in sota.maps[i]: # пересечения в соответствии с картой iRe = sota.maps[i][j][0] iPos = sota.maps[i][j][1] text+= str( diaps[iRe][iPos] )+"\n" if res == None: res = diaps[iRe][iPos].copy() else: res = res & diaps[iRe][iPos] for j in sota.maps[i]: # записываем результат пересечений обратно iRe = sota.maps[i][j][0] iPos = sota.maps[i][j][1] if len( res ) ==0: print( diaps[iRe][iPos] ) diaps[iRe][iPos] = res.copy() text+="inter: "+str(res)+"\n" #print( text ) if len( res ) == 0: for j in sota.maps[i]: # записываем результат пересечений обратно iRe = sota.maps[i][j][0] iPos = sota.maps[i][j][1] print( "null set", iRe, iPos ) print() return diaps def filterDiaps( diaps, intersect ): changed = False for i in range( len(diaps) ): toDel = [] for k in range( len( diaps[i] ) ): ch = reCharsC.findall( diaps[i][k] ) mark = False for j in range( len( ch ) ): ls = "".join( list( intersect[i][j] ) ) if not re.search( ch[j], ls, re.I ): mark = True #print( k, diaps[i][k], ch[j], ls, mark ) if mark: toDel.append( k ) if len( toDel ) > 0: changed = True toDel.reverse() for k in toDel: del diaps[i][k] #print( diaps[i], intersect[i] ) return changed def checkReBack( intersect, diaps ): for i in reBack: text = "" counter = 0 sets = [] maxs = [] for j in intersect[i]: if len( j ) == 1: text+= list( j )[0] else: text+= "("+str(counter)+ ")" sets.append( list( j ) ) maxs.append( len( j ) ) counter+= 1 iters = 1 for j in maxs: iters*= j if iters>limitBack: continue # долго не значит, что поможет maxs.append( 0 ) length = len( sets ) counter = [0]*(length+2) label = 0 iterCounter = 1 result = [] while counter[length] == 0: text2 = text for j in range( length ): text2= text2.replace( "("+str(j)+")", sets[j][ counter[j] ], 1 ) if re.fullmatch( getAllRe()[i], text2, re.I ): result.append( text2 ) counter[label]+= 1 while counter[label] == maxs[label]: label+=1 counter[label]+= 1 while label>0: label-=1 counter[label] = 0 if iterCounter % 10000000 == 0: print( iterCounter ) iterCounter+=1 diaps[i] = result fullABC = getFullABC() print( fullABC ) t1 = time.time() limitBack = 1000000000000 # к сожалению так changed = True count = 0 countUn = 0 while changed or countUn < 5: union = unionDiaps( res22 ) intersect = intersectMapping( union ) changed = filterDiaps( res22, intersect ) if countUn == 2: checkReBack( intersect, res22 ) if not changed: countUn+=1 else: countUn = 0 count+= 1 print( count ) for y in range( sota.d*2-1 ): for x in range( sota.d * 4 ): if y*4*sota.d+x in sota.maps.keys(): val = sota.maps[ y*4*sota.d+x ][1] if len( intersect[val[0]][val[1]] ) == 1: print ( list( intersect[val[0]][val[1]])[0], end="" ) else: print( "*", end="" ) else: print( " ", end="" ) print() print( "\nTotal time:", time.time()-t1, "sec." ) 

A more detailed description of the algorithm

  • getEntitys exactly like that? - Nick Volynkin
  • And the decision is cool, impressed. ) - Nick Volynkin
  • Yes, I was also amused that at first the task seems incredibly difficult, but when you come up with the best way to present the data, then everything goes like clockwork. - ReinRaus

Without affecting the details of the search, only the storage structure ... I propose to describe it through the graph.

Each cell contains a link to three adjacent - B, NW, SW, in the direction of the regulars. The regulars themselves contain a link to the corresponding first cell. From the starting cell of the regular season X we move away in the necessary direction Y times and get the value of the given cell. For operations with cells, the following functions will suffice:

 char getCell(char direction, int regex, int position)//получить значение char setCell(char direction, int regex, int position)//установить значение string getText(char direction, int regex)//получить текст для проверки 

    As far as I can see, three axes are used, 13 different values ​​are indicated on each of the axes, that is, you can use a three-dimensional array with coordinates [0..12, 0..12, 0..12] , or [1..13, 1..13, 1..13] , or even [-6..+6, -6..+6, -6..+6] , if your language allows negative array bounds.

    Moreover, since the coordinates are still two-dimensional, and not three-dimensional, any third coordinate depends on the other two. For example, if the coordinates are given by the range [0..12] , then you can use the formula x + y + z = 12 . Note that not for any pair of x and y from 0 to 12, there is a value of z from 0 to 12.