There is a text. It is necessary to determine the language of this text. Russian or Ukrainian. I used api from Yandex translator, but there is a limit of 1 million characters per day. 1M is not enough. Maybe someone knows some libraries in python3? Thank you in advance.

upd : at least let me know if the text is Ukrainian.

2 answers 2

You can use langdetect :

In [65]: from langdetect import detect In [66]: %paste detect('"Зоряні війни" офіційно оголосили назву нового епізоду') ## -- End pasted text -- Out[66]: 'uk' In [67]: %paste detect('Дональд Трамп подписал указ об официальном выходе США из Транстихоокеанского партнерства') ## -- End pasted text -- Out[67]: 'ru' 

If you need to make an assessment:

 In [78]: from langdetect import detect_langs In [79]: %paste detect_langs('Спробуй вгадати який це "язык"') ## -- End pasted text -- Out[79]: [ru:0.7142865675080949, uk:0.28571330147081586] In [80]: %paste detect_langs('Ты говоришь на "мові"') ## -- End pasted text -- Out[80]: [uk:0.9999979837984605] 

    The easiest and fastest option to use is cld2-cffi :

     pip3 install cld2-cffi 

    Code example:

     import cld2 # текст на Русском details = cld2.detect("Это мой образец текста") print(str(details)) # Вывод: # Detections(is_reliable=True, bytes_found=43, details=(Detection(language_name='RUSSIAN', language_code='ru', percent=97, score=658.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))) # текст на Украинском details = cld2.detect("Це мій зразок тексту") # Вывод: # Detections(is_reliable=True, bytes_found=39, details=(Detection(language_name='UKRAINIAN', language_code='uk', percent=97, score=862.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0)))