I have a DataFrame in which the first and second columns are text. Accordingly, I want to know the average word length for each value for one of the columns.
Wrote the following function:
def av_len(a,c): column_n = a.columns[c] + '_word_len' sum = 0 number = [] for i in range(a.shape[0]): df = p.findall(a.iloc[i][c]) for j in range(len(df)): sum += len((df)[j]) total = sum/len(df) number.append(total) sum = 0 a = a.assign(column_n = pd.Series(number)) return a[:5] We submit DataFrame, we specify column number, we receive one more column with the necessary parameter. The problem is that the new column must have the name " source_column_word_len "
This I tried to do with:
column_n = a.columns[c] + '_word_len' and
a = a.assign(column_n = pd.Series(number)) However, the name of the new column is still "column_n" . Tell me, how best to implement this idea?
It is necessary for me that for each separate column with the text I would create my own column with the measured parameter.
ipython- the console analog ofJupyter. You can try to save the data as CSV:df.head(5).to_csv(r'c:/temp/aaa.csv')and post the CSV in your question - MaxU