Hey.

How is FULLTEXT indexing done? I'm not talking about which teams, but about how you can imagine the process of their creation. For example, a column with TEXT type data is taken, all column data is glued together, all short words are removed (the default is a word that is less than four characters), removes noise words (and, for, around ...) and indexes (builds alphabetically ). I do not know if I wrote correctly, most likely not.

    2 answers 2

    It turned out that I wrote correctly. On another forum, a person threw off a picture, in which everything becomes clear. The whole indexing process can be understood from this picture: enter image description here

    That is, a column is taken (in this example, it is called Documents, it can have data like TEXT or VARCHAR ), all data rows are dropped into one heap, short words are thrown out of this heap (by default, a word is considered short if and fewer letters, this can be customized), words like "for", "for", "y", "a" ... are also discarded, what remains is sorted alphabetically and stuffed into a table with indices and links to the main one are put. the table.

      Something about sabzh is described, for example, here: https://dev.mysql.com/doc/refman/5.7/en/innodb-fulltext-index.html I think it will be enough for initial understanding.

      If it is very short, then the text is divided into tokens (words), they are cleaned from stop words and short words, inverted, and already in this form are searched for by the previously indexed contents of the table (s), also in inverse form

      • one
        Although the link can find the answer to the question, it is better to point out the most important thing here, and give the link as a source. If the page to which the link leads will be changed, the response link may become invalid. - From the queue of checks - cheops
      • Ok, "the most important" is moved in response. - Akina