What algorithms / methods can be used to distinguish the text about cooking, for example, from the text about programming?

    3 answers 3

    The simplest method is analyzing the frequency of words in accordance with thematic dictionaries.

    • This is a very bad method. Naive Baez works in some way, it seems, but he himself is learning from a test sample, and does not require a programmer's guessing. And in his work, he shows not just the number “similar / unlike”, but the probability that the text belongs to the group. - uhbif19

    Latent semantic analysis ( LSA )

    • @Merlin; Try to write more detailed answers. - Nicolas Chabanovsky ♦

    The easiest, classic way - Bayes classifiers. They are, for example, used in spam filters. (it is clear that now it’s far from them)

    There is a very powerful and functional library for text classification - DKPro TC .