There is a database with comments. need to find the top most frequent in all comments. All that at the moment has been done is to find the number of meetings of a specific word, which is specified in the code itself. Is it possible to somehow implement such a search, while being universal for any set of comments / texts / something else in Russian?
import pandas as pd import sys import pymysql import numpy as np import nltk from nltk.corpus import state_union from nltk.tokenize import word_tokenize from nltk.corpus import stopwords #ΠΏΠΎΠ΄ΠΊΠ»ΡΡΠ°ΡΡΡ ΠΊ Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
, Π·Π°ΠΏΠΈΡΡΠ²Π°Ρ ΠΊΠΎΠΌΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ Π² DataFrame db = pymysql.connect(host='localhost', user='root', passwd='', database='mom_db', charset='utf8') df = pd.read_sql("SELECT comm2 FROM comments ", db) #Π€ΡΠ½ΠΊΡΠΈΡ ΡΠ±ΠΈΡΠ°Π΅Ρ Π»ΠΈΡΠ½ΠΈΠ΅ ΡΠΈΠΌΠ²ΠΎΠ»Ρ ΠΏΠΎ Π³ΡΠ°Π½ΠΈΡΠ΅ ΡΠ΅ΠΊΡΡΠ°,ΡΠΎΡΠΊΠΈ , Π·Π°ΠΏΡΡΡΠ΅ ΠΈ ΠΏΡΠΎΡ. def delete_chars(str): str = str.lstrip() str = str.rstrip() str = str.replace("."," ") str = str.replace(","," ") str = str.replace("-"," ") str = str.replace("?"," ") str = str.replace("!"," ") str = str.replace(")"," ") str = str.replace("("," ") str = str.replace("..."," ") str = str.replace("β"," ") str = str.replace(":"," ") str = str.replace("<"," ") str = str.replace(">"," ") str = str.replace("/"," ") str = str.replace("``"," ") str = str.replace("'"," ") str = str.replace("Β«", " ") str = str.replace("Β»", " ") str = str.replace(";", " ") str = str.lower() return str df['comm2'] = df['comm2'].apply(delete_chars) st_w = set(stopwords.words('russian')) words_filtered = [] i = 0 for df['comm2'][i] in df['comm2']: df_tok = word_tokenize(df['comm2'][i]) i = i + 1 for w in df_tok: if w not in st_w: words_filtered.append(w) print(words_filtered) count = words_filtered.count('ΠΠ²ΡΠΎΠΏΠ΅') print(count) PS can there be a more optimal solution instead of the function delete_chars?
An example of the required output:
"ΡΠ»ΠΎΠ²ΠΎ_1" 33 "ΡΠ»ΠΎΠ²ΠΎ_2" 22 "ΡΠ»ΠΎΠ²ΠΎ_3" 11
plot_word_cloud()from this answer I did there practically the same as you want to do ... - MaxU