The percentage of matching words to the selected language

Question

Hello. I have five languages: English, Portuguese, Spanish, Russian, Bulgarian. When a user enters a word, I have to determine the percentage of the word’s compliance with all languages. For example, the word encyclopedia is entered and the result is given to me: Russian - 99%, Bulgarian 87%, Spanish - 3%, English - 3%, Portuguese - 2%. I took numbers from the ceiling. I looked through a bunch of api, but they all just define the language of the text and some give a% match to this language. And I need exactly these five languages. Help me please :)

@VostokSisters, and how can you find the percentage of compliance?
Just determine the language of the word you the very first apishka can
@andreycha I want to use api and I don’t quite understand which tags to put
Just check the letters and count points to a specific counter match letters with the alphabet (given the same type of letters of different alphabets), then the percentage match.

Yuri Negometyanov Yuri Negometyanov 4,921 12 39 · Answer 1 · 2017-02-05T08:30:56

For a word, you can calculate the Levenshtein distance from it to each word in the dictionary, associate with each of them the posterior probability of error (for the normal law of error distribution — according to Student’s law), then use the probability formulas for the sum of events.

For a phrase, you can use the probability formula to produce errors in words.

FORTRAN FORTRAN 547 6 18 · Answer 2 · 2017-02-04T22:55:58

Match with the alphabet. (It is desirable that unique characters be) in some languages, character codes can be repeated.

Editing version from @VostokSisters

FORTRAN

547 6 18

The problem is not in determining the language, I repeat, any apishka will cope with this. The problem is to determine whether the words match each of the five languages. There is a bit of a bad idea to compare a word with each word from a language, to check it with some algorithm, determine% word compliance and then choose the largest percentage as% language match. But this is what resource costs ... - Tabigon
And why not use the clumsy method of viewing the alphabet for such languages as English, Portuguese, Spanish, Russian, Bulgarian, I think it will work fine? (If there were more languages possible and would not work). The variant with the dictionary is also not bad in the trailer when there are options 50 to 50 you can use it. - FORTRAN
In the end, did as you described. Compiled 5 alphabets and compared each word with each alphabet. + I took into account a couple of rules that I found (for example, in Bulgarian "ъ" there could be words between two consonants at the end, and there is no such thing in Russian). - Tabigon
@Tabigon if tested, as a result? My main message is that for 5 languages there is no need to make a garden (unlike, say, 100 languages, as in Google for example). - FORTRAN

|

The percentage of matching words to the selected language

2 answers 2

More articles: