How is the mechanism that determines the amount of “water” in the text implemented?

Question

On the site text.ru , there is a tool that determines the uniqueness of the text, spelling, the amount of water and the spamming. What is the algorithm of the last two options?

I do not know what exactly was done there, but I understand something in this matter.
@ Alexander Muksimov, wondering how a car can determine such data, where it comes from and how it is a personal interest
This is one of the possible approaches en.wikipedia.org/wiki/Automated_essay_scoring#Criticism

Answer 1 · 2017-01-10T21:57:33

The source you indicated is about "water":

This parameter displays the percentage of the presence in the text of stop words, phraseological units, as well as verbal turns, phrases, connecting words that are not significant and do not carry semantic load.

That is, a certain dictionary has been compiled of "meaningless and not bearing semantic load" (According to the authors. Personally, I lose the meaning of the text without these words) words and expressions. Highlight / match words in a given text in a given dictionary and call them “water” - “a matter of technology”.

About "spamming".

The percentage of spammed text reflects the number of search keywords in the text. The more keywords in the text, the higher its spamming:

Also a dictionary of some search words. Select, match, call as we want.

Artificial intelligence and semantic networks, I personally, do not assume there.

How is the mechanism that determines the amount of “water” in the text implemented?

1 answer 1

More articles: