Hello. I have five languages: English, Portuguese, Spanish, Russian, Bulgarian. When a user enters a word, I have to determine the percentage of the word’s compliance with all languages. For example, the word encyclopedia is entered and the result is given to me: Russian - 99%, Bulgarian 87%, Spanish - 3%, English - 3%, Portuguese - 2%. I took numbers from the ceiling. I looked through a bunch of api, but they all just define the language of the text and some give a% match to this language. And I need exactly these five languages. Help me please :)

  • You can write yourself :) - Qwertiy
  • @Qwertiy, most - is to match the letters with the alphabet? If so, then this is just a stupid analysis. - VostokSisters
  • @VostokSisters, and how can you find the percentage of compliance? Just determine the language of the word you the very first apishka can - Tabigon
  • @andreycha I want to use api and I don’t quite understand which tags to put - Tabigon
  • @Tabigon, alphabetically - how? Just check the letters and count points to a specific counter match letters with the alphabet (given the same type of letters of different alphabets), then the percentage match. It is easy. But stupid. You can not do this. If you implement it normally, then this is a big task. - VostokSisters

2 answers 2

For a word, you can calculate the Levenshtein distance from it to each word in the dictionary, associate with each of them the posterior probability of error (for the normal law of error distribution — according to Student’s law), then use the probability formulas for the sum of events.

For a phrase, you can use the probability formula to produce errors in words.

    Match with the alphabet. (It is desirable that unique characters be) in some languages, character codes can be repeated.

    Editing version from @VostokSisters

    • The problem is not in determining the language, I repeat, any apishka will cope with this. The problem is to determine whether the words match each of the five languages. There is a bit of a bad idea to compare a word with each word from a language, to check it with some algorithm, determine% word compliance and then choose the largest percentage as% language match. But this is what resource costs ... - Tabigon
    • And why not use the clumsy method of viewing the alphabet for such languages ​​as English, Portuguese, Spanish, Russian, Bulgarian, I think it will work fine? (If there were more languages ​​possible and would not work). The variant with the dictionary is also not bad in the trailer when there are options 50 to 50 you can use it. - FORTRAN
    • In the end, did as you described. Compiled 5 alphabets and compared each word with each alphabet. + I took into account a couple of rules that I found (for example, in Bulgarian "ъ" there could be words between two consonants at the end, and there is no such thing in Russian). - Tabigon
    • @Tabigon if tested, as a result? My main message is that for 5 languages ​​there is no need to make a garden (unlike, say, 100 languages, as in Google for example). - FORTRAN