Hello. Received an order, in fact it comes down to a banal network of articles with search only by tags. The hitch is that they promise over 9000 visits and several million entries, in connection with which the query of the form

SELECT [...] WHERE tags LIKE %tag1*tag2% 

seems unacceptable. While thinking only about creating a tag table with an ID and a table with cross-references [tagID, articleID], but here I’m afraid to rest on the volume of the latter - a record can have several dozen, or even hundreds, of tag characteristics. Also consider sequential search (click-view-click-view, etc.)

Actually, who can advise? // yes, a similar system in a hashcode - imagine ten million questions with 30-200 tags each.

    1 answer 1

    probably this way: store tags in a separate table in the same way as for a small site. and on the site do not display real-time requests, and prepare first. you can recalculate this cache only when adding an article. We add the article and recalculate only those tags that are in it.

    • Oh how. Those. I understood correctly: create table [tagID, articleID__S__] and update it when tags change? And then the user to produce overlapping values. Load more on admin panel, but here it is not terrible. It is logical, while I will listen again, but it looks like a "test", thanks =) - Sh4dow
    • there are problems with this approach. for example, the article was added, but the tags were not recounted. or when deleting articles are not recounted. preferably regularly (for example, once a day) to do a full recount. that is, sometimes the site will be not entirely accurate data. but it's not scary. - Pavel Vladimirov
    • if you take for example the site avito.ru. there when adding an ad, even when the moderator has already approved it, it does not immediately appear in the list of ads. they have indexing every 5-10 minutes. when deleting just as well, the ad does not immediately disappear. thus, indexing servers work (sphinx for example). I personally applied Sphinx to search, I think that it can also be used for searching by tags. - Pavel Vladimirov
    • I doubt that someone will offer you a real-time request that will work quickly. so in any case, dig towards the cache / index. either organize it yourself or use special programs. habrahabr.ru/blogs/webdev/30594 yes, search engines - this is not necessarily a search for the text. They can work on the keys. delta index is also not to forget. - Pavel Vladimirov