Faced such a problem: it is necessary to implement an algorithm for comparing two lines - the difference should not exceed 2 characters (case change is not considered).

What did you do:

  1. I translate all the letters in lower case. I break lines into an array.

  2. If the difference in the length of the array> 2 - error

  3. If the number of changed characters is> 4 (delete 2, add 3; replace 2 characters, add one, etc.) - error

The code is implemented on the example of PHP:

$oldValue = preg_split('//u', (mb_strtolower($oldValue)),-1, PREG_SPLIT_NO_EMPTY); $newValue = preg_split('//u', (mb_strtolower($newValue))), -1, PREG_SPLIT_NO_EMPTY); if(abs((count($oldValue) - count($newValue))) > 2 || count(array_diff($newValue, $oldValue)) + count(array_diff($oldValue ,$newValue)) > 4) { throw new Exception('bla-bla'); } 

Successfully work out on pairs:

 Ивановвввв -> Иванов (Exception) Иванов -> Иванв Иванов -> Иван иИванов -> Иванов ииванова -> Иванов Иванокк -> Иванов Пелоге -> Пелагея 

But it does not work correctly on pairs:

 Пелашш -> Пелагея 

Someone already faced with this kind of tasks?

How to find the optimal solution without listing all possible cases?

  • one
    Well, yes, I checked with examples, if the difference in the center is> 2. Maybe similar_text, levenshtein or soundex can help. But there is already a percentage. Either there are separate extensions pear.horde.org there is a great deal for this. - tcpack4
  • the difference should not exceed 2 characters (case change is not considered) What we are talking about is Levenshtein distance - you understand, this is good. True, it is not clear whether you agree with the set of operations used in this implementation of the function. Because when the case Pelashsh -> Pelagia - says, then the number of transition 3 is correct: (Pelashsh -> (replacement) -> Sang r w -> (replacement) -> Pelag e -> (addition) -> Pelagia I ). - Akina

0