There is a city template (full name) and there is an array obtained after parsing XML, you need to check if the resulting array matches the existing template.

For example, the template city Dubna Moscow District

From XML, the city of Dubna belongs to the Moscow region.

Or, from the XML, the village of Petushki belongs to the “Petushki” from the template, that is, you need to discard the first occurrence of this VILLAGE and compare the existing Petushki with the Petushki from the template

  • Give a real example of a template, just like it is in your code, and a piece of XML is specified, where there is a coincidence of interest. And then look in the direction of the Levenshtein algorithm and further links. - rjhdby
  • Thank you, what you need Pts. good algorithm, BUT you have to assign crutches (additional conditions) from time to time, Thank you! :) and besides Levenshteyn, is there a more accurate way?) - Zimzibar

1 answer 1

I would use about this algorithm.

For XLM array and pattern

  1. All non-literal and non-numeric characters replaced by spaces
  2. All consecutive spaces resulted in one character.
  3. Led strings to one case (say upper)
  4. Split strings into word arrays
  5. Sorted arrays by values

Next, three options

  1. Or wrote a function that implements the Levenshtein algorithm for two arrays. That is, I found not the coefficient of differences of rows, but the coefficient of differences of arrays.
  2. Either glued these arrays back to the string and used the standard Leveneshtein algorithm (if the strings are very different in length, then it is fraught)
  3. Would check if a smaller array is a subset of a larger one. Perhaps with a tolerance of one element.

The first part of Marlezonsky ballet:

function normalize(&$value){ $value = sort( mb_strtoupper( preg_replace(array("/[\W_]/","/ +/")," ", $template) ) ); } $template = "шаблон"; $xml = array("массив проверяемых значений"); normalize($template); foreach($xml as &$value){ normalize($value); } 

The second part is something like yourself.