What are the metrics for determining the similarity of two vectors from word combinations, knowing the similarity between any 2 elements. for example

  • ("Media", "television", "television studios and television companies")
  • ("Society", "Politics and Society", "mass media")

    2 answers 2

    You can run through the dictionary of synonyms, leading to the standard form (using for calculations only one word from the group of synonyms). Then, for each set of words, you can build a frequency diagram (in the simplest case, a diagram of zeros and ones indicating whether there is a word or not). And then compare histograms to each other, say, using the correlation formula, or, if a bit histogram is used, there is an analogue of the correlation called the Tanimoto measure.

      Try to dig in the direction of fuzzy sets. There is such a thing as "Hamming distance"