There is a base of videos, each of which has a name consisting of several words. Base on 6 million records. At the entrance receive search queries (also consisting of several words). It is necessary for these requests to select from the database the most relevant video recordings. The main criterion is the speed of work. What methods have I tried:

  1. I split each search query into an array of words and made mysql queries of the form:

    select * from videos where title LIKE "%слово1%" AND title LIKE "%слово2%" AND title LIKE "%слово3%" 

    If nothing was found, then removed some “words” from the query and executed it again. It worked very slowly.

  2. I took out all the records from the database and went through them in a cycle, inside which I broke words into each search query, and checked what proportion of the total number of words was in the name of the video. If the coefficient of occurrences of more than 0.7, then recorded this video in the result for this search query. The way works too slowly.

I ask you to suggest a solution in which there will be a good speed of work (I consider a good speed a sample of 30,000 videos for 10-20 minutes on an average PC).

  • You should see if your database supports full-text search. In general, any like '% xxx%' results in a full table scan. If you do not use full-text indexes, then the only way for each record is to place all the words in a separate index table and look for complete correspondence over it, without like - Mike
  • If you don’t find anything suitable for a full-text search, tell me, I can paint a bicycle with an index table in more detail - Mike
  • Mike, I don't have much knowledge in this area. Could you implement what I described for a fee? If you wish, write to Skype Thetur, agree - thetur
  • And you can specify in the label, what type of database is used? MsSQL? MySQL? or something else? - cyadvert
  • Do you have a database then what? and you have already tried to drive in Google "name of your-database full-text search." No, I will not be hired. I can basically write well with the main query - Mike

1 answer 1

You can slightly fix your first option.
Make the query as before (via LIKE ), but with two differences: 1. WHERE set via OR ; 2. Create your own match factor and sort the results by it. If слово1 is, then +1; if слово2 is, then +1, etc. And you can also add phrases.

 SELECT *, (IF(title LIKE "%слово1%", 1, 0)+ IF(title LIKE "%слово2%", 1, 0)+ IF(title LIKE "%слово3%", 1, 0)+ IF(title LIKE "%слово1 слово2%", 3, 0)+ IF(title LIKE "%слово2 слово3%", 3, 0)) as relevance FROM videos WHERE title LIKE "%слово1%" OR title LIKE "%слово2%" OR title LIKE "%слово3%" ORDER BY relevance DESC