On the site text.ru , there is a tool that determines the uniqueness of the text, spelling, the amount of water and the spamming. What is the algorithm of the last two options?
- I do not know what exactly was done there, but I understand something in this matter. What problem do you want to solve? - Alexander Muksimov
- @ Alexander Muksimov, wondering how a car can determine such data, where it comes from and how it is a personal interest - Node_pro
- This is one of the possible approaches en.wikipedia.org/wiki/Automated_essay_scoring#Criticism - Alexander Muksimov
1 answer
The source you indicated is about "water":
This parameter displays the percentage of the presence in the text of stop words, phraseological units, as well as verbal turns, phrases, connecting words that are not significant and do not carry semantic load.
That is, a certain dictionary has been compiled of "meaningless and not bearing semantic load" (According to the authors. Personally, I lose the meaning of the text without these words) words and expressions. Highlight / match words in a given text in a given dictionary and call them “water” - “a matter of technology”.
About "spamming".
The percentage of spammed text reflects the number of search keywords in the text. The more keywords in the text, the higher its spamming:
Also a dictionary of some search words. Select, match, call as we want.
Artificial intelligence and semantic networks, I personally, do not assume there.