Check for the number of characters changed

Question

Faced such a problem: it is necessary to implement an algorithm for comparing two lines - the difference should not exceed 2 characters (case change is not considered).

What did you do:

I translate all the letters in lower case. I break lines into an array.
If the difference in the length of the array> 2 - error
If the number of changed characters is> 4 (delete 2, add 3; replace 2 characters, add one, etc.) - error

The code is implemented on the example of PHP:

$oldValue = preg_split('//u', (mb_strtolower($oldValue)),-1, PREG_SPLIT_NO_EMPTY); $newValue = preg_split('//u', (mb_strtolower($newValue))), -1, PREG_SPLIT_NO_EMPTY); if(abs((count($oldValue) - count($newValue))) > 2 || count(array_diff($newValue, $oldValue)) + count(array_diff($oldValue ,$newValue)) > 4) { throw new Exception('bla-bla'); }

Successfully work out on pairs:

 Ивановвввв -> Иванов (Exception) Иванов -> Иванв Иванов -> Иван иИванов -> Иванов ииванова -> Иванов Иванокк -> Иванов Пелоге -> Пелагея

But it does not work correctly on pairs:

 Пелашш -> Пелагея

Someone already faced with this kind of tasks?

How to find the optimal solution without listing all possible cases?

Well, yes, I checked with examples, if the difference in the center is> 2. Maybe similar_text, levenshtein or soundex can help.
But there is already a percentage. Either there are separate extensions pear.horde.org there is a great deal for this.
the difference should not exceed 2 characters (case change is not considered) What we are talking about is Levenshtein distance - you understand, this is good.
True, it is not clear whether you agree with the set of operations used in this implementation of the function.
Because when the case Pelashsh -> Pelagia - says, then the number of transition 3 is correct: (Pelashsh -> (replacement) -> Sang r w -> (replacement) -> Pelag e -> (addition) -> Pelagia I ).

Check for the number of characters changed

0

More articles: