Group the word list into parts of speech and save into separate files.

Question

dict = open('C:\\Users\\nvasi\\Desktop\\slovar.txt',"r", encoding="utf-8").readlines() # словарь word = dict[0].rstrip() # слово. 0 - это строка. Каждое слово с новой строки morph = pymorphy2.MorphAnalyzer() # морфологический анализатор p = morph.parse(word)[0] print(p.tag) # возвращает "INFN,impf,tran", где "INFN" - это часть речи

We need each line (from 0 to 41469 ) to define the corresponding dictionary of parts of speech, that is, if in the variable p.tag - INFN , .. then write it into INFN.txt .

To get an answer, explain what exactly you see the problem, how to reproduce it, what you want to get as a result, etc. Give an example that clearly demonstrates the problem.

Accepted Answer · 2018-11-29T06:16:06

The .tag attribute returns the pymorphy2.tagset.OpencorporaTag object, which contains a lot of additional information:

 In [266]: tag = morph.parse("слово")[0].tag In [267]: tag Out[267]: OpencorporaTag('NOUN,inan,neut sing,nomn') In [268]: type(tag) Out[268]: pymorphy2.tagset.OpencorporaTag In [269]: tag. tag.ANIMACY tag.GENDERS tag.NUMBERS tag.RARE_CASES tag.add_grammemes_to_known tag.ASPECTS tag.INVOLVEMENT tag.PARTS_OF_SPEECH tag.TENSES tag.animacy tag.CASES tag.KNOWN_GRAMMEMES tag.PERSONS tag.TRANSITIVITY tag.aspect > tag.FORMAT tag.MOODS tag.POS tag.VOICES tag.case

As @insolor already said in his answer, you can use the tag.POS (Part Of Speach) attribute to get the name of a part of speech as a string.

Example:

 from pathlib import Path from itertools import groupby from pymorphy2 import MorphAnalyzer infile = Path(r"C:\Temp\slovar.txt") words = infile.read_text(encoding="utf-8").splitlines() print(words) #['каждый', 'охотник', 'желает', 'знать', 'где', 'сидит', 'фазан'] morph = MorphAnalyzer() items = [(str(morph.parse(w)[0].tag.POS), w) for w in words] print(items) #[('ADJF', 'каждый'), ('NOUN', 'охотник'), ('VERB', 'желает'), ('INFN', 'знать'), ('ADVB', 'где'), ('VERB', 'сидит'), ('NOUN', 'фазан')] for g, it in groupby(sorted(items), key=lambda x: x[0]): otufile = infile.parent / f"{g}.txt" otufile.write_text("\n".join([word for pos, word in it]), encoding="utf-8")

Result:

NOUN.txt:

 охотник фазан

VERB.txt:

 желает сидит

...

@Nikolai Vasilenkov, corrected the answer - now one additional file None.txt should be created with all the "words" for which pymorphy2 could not determine the part of speech

Answer 2 · 2018-11-29T08:23:57

It is better to take not p.tag , but p.tag.POS (POS - Part of speech, ie, part of speech) - this will be a finished line with the name of a part of speech (see the User’s Guide / working with tags ). Add at the end .txt - get the file name. Open a file with this name to write, write there what you need.

Group the word list into parts of speech and save into separate files.

2 answers 2

More articles: