I do not know how correct the title itself looks, but I will describe it in more detail here.

So, the essence is this: there is, for example, a certain range and a couple of conditions:

  • $start and $end always have the same character length (although this is not so important);

  • And if there is a range that includes the element 555123456 ( 9 characters long), then there is no other range that would include a number that is longer than the existing one and starts from 555 - i.e. The number 555123456 7 (length - 10 characters) does not exist by definition.

  • The last digits in $start and $end 0 and 9 respectively, which also makes the task easier.

Those. (hereinafter I write conditionally, without reference to any programming language or regexp standard)

 555000000 - 555999999 // Т.е. 555* 666000000 - 666999999 // Т.е. 666* 5550000000 - 5559999999 // Не существует в списке диапазонов по определению задачи, игнорируем. 6660000000 - 6669999999 // Не существует в списке диапазонов по определению задачи, игнорируем. 

In other words: if 555000000 - 555999999 is specified, then no 555000000 0 - 555999999 9

Take for example

 555000000 - 555999999 

Task: convert to 555* . It’s easy to figure out this example - you need to subtract $start from $end to get a certain $result variable, which is equal to 999999 . Then use the result value of the $result.length() operation relative to, say, $start , to get the line 555*

 $shortrec = substr($start,0,-strlen($result)) . "*"; 

But if the range looks like

 777120000 - 777259999 

then the difference will be 139999 . Here it is more difficult, but also understandable - in addition to partly indicated above, we compare positions 0 and 9 in both variables, “exit” by 12 and process 12 additionally, increasing it 13 times and getting the following sequence in a loop:

 77712* 77713* 77714* ... 77725* 

But what if there is a range,

 888125120 - 888959599 

(here we will recall condition number 3 facilitating logic, so as not to operate with an expression like [5-9] if $start would be 88812512 5 ) for which you need to get

 88812512* 8882* 8883* ... 8888* 88890* 88891* 88892* ... 88894* 888950* 888951* 888952* ... 888958* 8889590* 8889591* 8889592* ... 8889595* 
  • I ask you to judge whether I follow the correct logic from the very beginning or not?
  • And is there some universal solution (algorithm) for such problems?
  • What would you do?

PS Programming language is not important - the approach itself and the algorithm are important. I hope I did not break the brain reader, but the task is not from the training, but the real one. So if there are ideas, I will be very grateful.

  • 88812512 * - not enough. Another 8881252 **, 888126 ***, 88813 **** will have to be considered ... - gecube
  • @gecube That's right, thanks. I just wrote in haste. Just the goal itself, the most important thing is clear. - void
  • one
    And by the way, why do you need a regular expression? Is not it easier to simply compare the number with the upper and lower bound? - VladD
  • First of all, thank you so much for your attention, and secondly I want to apologize for the late reply - unforeseen circumstances in the form of an unexpected broken plumbing and a minor repair in the apartment. Accordingly, I could not consider in detail these days all the answers and comments :) - void
  • @VladD The fact is that there is some old system that operates in this way, and it also checks the uniqueness of the number in order to prevent a small range from entering, for example, 5550100-5550199, into the existing larger 5550000-5559999. The new system has an interface that requires exactly regulars. Therefore, it became necessary to convert and export data. Although I managed to convince the vendor and provide a compatibility interface for using ranges. Therefore, the issue has been resolved in my favor, and the question here goes into the thematic and interesting category :) - void

3 answers 3

We will write the generation algorithm for a particular case - when the number of characters at the lower and upper bound is the same. If the number of characters is different, then the initial range will be divided into 3 ranges:

  1. from the minimum limit to the maximum number with the same number of digits
  2. all full ranges of numbers that were not affected by boundaries
  3. from the minimum number with the same number of digits as the maximum limit to the maximum limit

The example makes it clearer what all this means. The original range of 14-123456 divided into three ranges of 14-99, 100-99999, 100000-123456 .
If the second range is present, then it is defined by a simple expression \d{n,m} , the first and second ranges need to be generated separately and within the limits of these ranges always the same number of characters, which means that we can apply the basic generation algorithm and then combine these 3 ranges with an alternative.

The basic algorithm for ranges with the same number of digits

  1. We will try to find a common part for the upper and lower bounds. For example, in the range 1234-1278 you can put out 12 as ordinary characters.
  2. Let's start viewing the borders from left to right. For example, the range 1234-5678
  3. The first digits of the lower and upper bounds 1 and 5 , therefore, we will divide into 3 ranges: 1234-1999, 2000-4999, 5000-5678 , that is, we would like to single out the middle, which can be represented as [2-4]\d{3}
  4. The remaining two ranges will be processed using the same algorithm. For example, for 1234-1999 first put the number 1 as the total. The range of 234-999 we divide into 2 ranges of 234-299, 300-999
  5. And so on for all ranges that appear during splitting into multiple ranges.

There is a lot of code, but you can run and test it.

 $( document ).ready( function() { $( "#rangeLeft, #rangeRight" ).keydown( function() { clearDisplay(); } ); $( "#run" ).click( function() { clearDisplay(); var rangeLeft = $( "#rangeLeft" ).val(); var rangeRight = $( "#rangeRight" ).val(); if ( ! checkRanges( rangeLeft, rangeRight ) ) return; $( "#result" ).append( generateFullRegExp(rangeLeft, rangeRight)+"<BR/>" ); $( "#test" ).show(); } ); $( "#test" ).click( function() { var rangeLeft = $( "#rangeLeft" ).val(); var rangeRight = $( "#rangeRight" ).val(); var re = new RegExp( "^"+generateFullRegExp(rangeLeft, rangeRight)+"$" ); for( var i=Math.pow( 10, rangeLeft.length-1 ); i<Math.pow( 10, rangeRight.length); i++ ) { if ( re.test( i+"" ) && ( i<parseInt(rangeLeft) || i>parseInt(rangeRight) ) ) $( "#result" ).append( "Тест провален на: " + i+"<BR/>" ); if ( !re.test( i+"" ) && ( i>parseInt(rangeLeft) && i<parseInt(rangeRight) ) ) $( "#result" ).append( "Тест провален на: " + i+"<BR/>" ); }; $( "#result" ).append( "Тест пройден от "+Math.pow( 10, rangeLeft.length-1 )+" до "+i+"<BR/>" ); } ); } ); function checkRanges( rangeLeft, rangeRight ) { if ( /\D/.test( rangeLeft ) || /\D/.test( rangeRight ) ) { $( "#result" ).append( "Введите два числа<BR/>" ); return false; }; rangeLeft = parseInt( rangeLeft ); rangeRight = parseInt( rangeRight ); if ( isNaN( rangeLeft ) || isNaN( rangeRight ) ) $( "#result" ).append( "Не указаны границы диапазонов<BR/>" ); if ( rangeLeft < 0 ) $( "#result" ).append( "Левая граница меньше 0<BR/>" ); if ( rangeRight < 0 ) $( "#result" ).append( "Правая граница меньше 0<BR/>" ); if ( rangeLeft > rangeRight ) $( "#result" ).append( "Левая граница больше правой границы<BR/>" ); return( !( rangeLeft < 0 || rangeRight < 0 || rangeLeft > rangeRight || isNaN( rangeLeft ) || isNaN( rangeRight ) ) ); }; function maxBeginStr( str1, str2 ) { var res = /^(.*)[^-]*\-\1/.exec( str1 + "-" + str2 ); return res ? res[1] : ""; }; function midDiap( start, end ){ var st0int = parseInt( start[0] ); var en0int = parseInt( end[0] ); if ( st0int+1 == en0int ) { var res = end[0]; } else { if ( st0int == en0int-2 ) { res=( st0int+1 )+""; } else { res="["+( st0int+1 )+"-"+(en0int-1)+"]"; }; }; if ( start.length == 1 ) return res; return res+"\\d{"+(start.length-1)+"}"; }; function lowDiap( num, pos ) { var res = num.substr(0, pos); var highRange = parseInt( num[pos] )-1; if ( highRange == -1 && pos == num.length-1 ) return num; if ( highRange == -1 ) return null; // выражение можно не включать if ( pos == num.length-1 ) highRange++; res += "[0-"+highRange+"]"; if ( num.length != pos+1 ){ res += "\\d{"+(num.length-pos-1)+"}"; } return res; }; function highDiap( num, pos ) { var res = num.substr(0, pos); var lowRange = parseInt( num[pos] )+1; if ( lowRange==10 && pos == num.length-1 ) return num; if ( lowRange==10 ) return null; // выражение можно не включать if ( pos == num.length-1 ) lowRange--; res += "["+lowRange+"-9]"; if ( num.length != pos+1 ){ res += "\\d{"+( num.length -pos-1)+"}"; } return res; }; function getRegExp( start, end ) { if ( start.length != end.length ) return "Invalid input"; var res= maxBeginStr( start, end ); start= start.substr( res.length ); end = end.substr( res.length ); if ( start.length == 0 ) return res; var st0int = parseInt( start[0] ); var en0int = parseInt( end[0] ); var resArr= Array(); if ( start.length > 1 ) { if ( st0int != en0int-1 ) { resArr.push( midDiap( start, end) ); } for ( var i=1; i<end.length; i++ ) { var miniRe = lowDiap( end, i ); if ( miniRe != null ) { resArr.push( miniRe ); } miniRe = highDiap( start, i ); if ( miniRe != null) { resArr.push( miniRe ); } } } else { resArr.push( "["+start+"-"+end+"]" ); }; return res+"(?:"+resArr.join("|")+")"; }; function generateFullRegExp( rangeLeft, rangeRight ) { if ( rangeLeft.length == rangeRight.length ) { return getRegExp( rangeLeft, rangeRight ); }; var resArr = Array(); resArr.push( getRegExp( rangeLeft, "9".repeat( rangeLeft.length ) ) ); if ( rangeRight.length - rangeLeft.length > 1 ) resArr.push( ("\\d{"+(rangeLeft.length+1)+","+(rangeRight.length-1)+"}").replace(/(\d+),\1/, "$1") ); resArr.push( getRegExp( "1"+"0".repeat( rangeRight.length-1 ), rangeRight ) ); return "(?:"+resArr.join("|")+")"; }; function clearDisplay() { $( "#result" ).html( "" ); $( "#test" ).hide(); }; 
 #test { display:none } 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <BODY> <INPUT id="rangeLeft" value=1234 /> - <INPUT id="rangeRight" value=4567 /> <BR/> <BUTTON id="run">Генерировать</BUTTON> <PRE id="result" ></PRE> <BUTTON id="test">Проверить</BUTTON> </BODY> 

Existing defect
For a range of 10-29 etc. a redundant expression is generated (?:2[0-9]|1[0-9])

  • Thank you The approach is very meaningful and complex (although the very formulation of the problem is complex). - void

Because the bounds of the ranges have the same bit width, then you can use this method of converting the range into a regular one - this is a bitwise comparison of the boundaries and the generation of retrospective checks for all previous digits relative to the current position.

For a range of 813-895, the expression will be for example:

 8[1-9](?:(?<=1)[3-9]|(?<=[2-8])[0-9]|(?<=9)[0-5]) 

Please note that three cases are considered here relative to the average discharge:

  1. The value is equal to the lower border - 1, so the current value can not be less than 3.
  2. The value between the lower and upper bounds, so the current value can be any digit.
  3. The value is equal to the upper limit - 9, so the current value can not be greater than 5.

The higher the bit depth, the harder it will turn out to be a finite regular expression, since For each low order it is necessary to perform retrospective checks for all high categories.

  • Thank you - it was very interesting to consider the logic! - void

I would do a recursive descent: (C #)

 IEnumerable<string> GetPatterns(string start, string end) { if (start.Length == 0) { yield return ""; yield break; } var startPrefix = start.Substring(0, start.Length - 1); var endPrefix = end.Substring(0, end.Length - 1); var startSuffix = start.Last(); var endSuffix = end.Last(); var rec = GetPatterns(startPrefix, endPrefix).ToList(); for (int i = 0; i < rec.Count; i++) { char startDigit = (i == 0) ? startSuffix : '0'; char endDigit = (i == rec.Count - 1) ? endSuffix : '9'; if (startDigit == '0' && endDigit == '9') { yield return rec[i] + '*'; } else { for (var digit = startDigit; digit <= endDigit; digit++) yield return rec[i] + digit; } } } 

Check: http://ideone.com/xihhpr

Pay attention to the result for the interval 5050000-5959999 - do you need this, or not?

  • Thanks again - a very correct approach, and even on C # - it was exactly what I was going to use for the working version. Plus to all, since all the answers deserve thanks and speak of remarkable logical thinking. - void
  • @void: thanks! - VladD