What algorithms / methods can be used to distinguish the text about cooking, for example, from the text about programming?
3 answers
The simplest method is analyzing the frequency of words in accordance with thematic dictionaries.
- This is a very bad method. Naive Baez works in some way, it seems, but he himself is learning from a test sample, and does not require a programmer's guessing. And in his work, he shows not just the number “similar / unlike”, but the probability that the text belongs to the group. - uhbif19
|
Latent semantic analysis ( LSA )
- @Merlin; Try to write more detailed answers. - Nicolas Chabanovsky ♦
|
The easiest, classic way - Bayes classifiers. They are, for example, used in spam filters. (it is clear that now it’s far from them)
There is a very powerful and functional library for text classification - DKPro TC .
|