I have data with information about users:

users: [{ first_name: 'Вова', last_name: 'Нуйкин', domain: 'kebab' }, { first_name: 'Вова', last_name: 'Воробьев', domain: 'vorobey' }] 

My task is to do a search by users, but I do not know which search algorithm will be good, for searching by users, if I type in the search line "Vova" or "Vova Nu" or "Nuyk" or "kebab".

I wrote a function in which I check the first N letters in user.first_name , user.last_name and user.domain . The search works when I type "Vova", "Nuikin", or "kebab", but problems start when I look for "Vova Nuykin". In this case, searchResults Vova Nuikin and Vova Vorobyev are recorded in searchResults . How can I fix this?

This is my function:

 search = (users, value) -> searchResults = [] for user in users for value in values if user.first_name.toLowerCase().slice(0, value.length).includes(value) or user.last_name.toLowerCase().slice(0, value.length).includes(value) or user.domain.toLowerCase().slice(0, value.length).includes(value) searchResults.push(user) return searchResults 
  • one
    This is a bad algorithm. There will be problems with him not only because of the concatenation of the name and surname, but also because of typos like Shuikin / Nuykin or Sparrows / Varabiev . I advise you to get acquainted with the algorithms of fuzzy search, based on the measurement of the Damerau -Levenstein distance - igumnov
  • My task is not to check typos like Shuykin / Nuykin or Sparrows / Varabiev. - Bob Napkin

2 answers 2

Perhaps this algorithm will work:

  1. Divide user input into words by spaces.
  2. For each table entry, include it in the result only if each word of the user’s input is included in at least one field of the record.

If you need inaccurate entries, try a more complicated procedure:

  1. Divide user input into words by spaces.
  2. For each entry in the table, calculate its heuristic coefficient of conformity: the more it fits, the greater the ratio. For example:
    • if the word from the input matches one of the table fields, add to index 1.1, and exclude this word and this field from further comparison for this record
    • if the word from the input is a substring of some of the table fields, add to index 1, exclude this word from further comparison for this record, and lower the weight of this field
    • if the word from the input is an inaccurate substring of some of the table fields, add 1 / (1 + number of errors) to the index, exclude this word from further comparison for this record, and lower the weight of this field
  3. Sort the table entries by the compliance index, output the top element, as well as those whose index value is at least 80% of the maximum.

(Figures and coefficients taken from the ceiling.)

    For such things, nothing much faster than the trinity trees have come up with. Ergo, if you need an exact search:

    1. Immediately build 2 trees, in which the keys will be either the name or the surname.
    2. We break the input into words, and we look for each word in both trees.
    3. We display the results only if there are coincidences both there and there, if there are more than one words, or at least one if there is one word.

    With a not very large number of records, all this can be implemented on the client side, and work without any access to the database at all. The implementation of ternary trees in JavaScript is run without problems, for example: one , two (note the prefixSearch() method in the examples - it’s the !) etc.