I'm trying to use the console program from Yandex. Called mystem. Info here https://tech.yandex.ru/mystem/ .

In short, she receives the Russian text as an input and leads the nouns to the nominative case and verbs to the infinitive.

But I can not translate the text. I do this: 1. I dropped the .exe into the root folder. 2. I launch cmd, I write in it "mystem.exe I love" I expect that "I love" will turn into "love" I receive such an error. enter image description here

Or this if I try to load text from a file.

enter image description here

The problem in the second case seems to be with the encoding. I ask for help.

  • The problem in the second case seems to be with the encoding. The program wishes UTF-8. A gets a CP1251. That's why he is offended. Or convert the file, or specify the encoding in the options. Ps. Unfortunately, the documentation says nothing about the BOM - perhaps it is indifferent to its presence or absence, but it should be checked. - Akina

1 answer 1

-e
I / O encoding. The options are: cp866, cp1251, koi8-r, utf-8 (default).

mystem -e cp1251 input 

On the first occasion, I'm not sure about the payp in win, but it seems to me that you are transferring input incorrectly (he is trying to find the file).
Try this:

 echo "Я люблю" | mystem -e cp1251 
  • Thanks, it seems he began to work, but still does not understand the encoding. - NikitaBobukh
  • And what is your encoding? - vp_arth
  • I tried to save the file in all variants that the notebook offers - ANSI, UTF-8 and unicode. - NikitaBobukh
  • Unicode is not a coding, but a coding standard. Who thought of the so-called format? Is it utf-16? - vp_arth
  • I do not know, but there is a shift =) In this input format echo "I love" | mystem -e cp1251 with cp866 encoding started working) Now I will try to run through R and in general there will be beauty) - NikitaBobukh