I have a DataFrame in which the first and second columns are text. Accordingly, I want to know the average word length for each value for one of the columns.

Wrote the following function:

def av_len(a,c): column_n = a.columns[c] + '_word_len' sum = 0 number = [] for i in range(a.shape[0]): df = p.findall(a.iloc[i][c]) for j in range(len(df)): sum += len((df)[j]) total = sum/len(df) number.append(total) sum = 0 a = a.assign(column_n = pd.Series(number)) return a[:5] 

We submit DataFrame, we specify column number, we receive one more column with the necessary parameter. The problem is that the new column must have the name " source_column_word_len "

This I tried to do with:

 column_n = a.columns[c] + '_word_len' 

and

 a = a.assign(column_n = pd.Series(number)) 

However, the name of the new column is still "column_n" . Tell me, how best to implement this idea?

It is necessary for me that for each separate column with the text I would create my own column with the measured parameter.

  • Can you give a small example of anonymous data (as text) and the desired resulting DF? Are you looking for the average word length in a column for each line? - MaxU September
  • Yes, in the column for each row. Can I somehow insert a result string from Jupyter Notebook here in a readable form? - Ilya Mitusov
  • String or DF? - MaxU September
  • Tell me, how did you put in the answer? ru.stackoverflow.com/questions/681724/… - Ilya Mitusov
  • I use ipython - the console analog of Jupyter . You can try to save the data as CSV: df.head(5).to_csv(r'c:/temp/aaa.csv') and post the CSV in your question - MaxU

1 answer 1

Change:

 a = a.assign(column_n = pd.Series(number)) 

on

 a[column_n] = number 

BUT the nested loop is a very inefficient way. Surely you can use a vectorized approach. But without an example of input and output data - it is difficult to say for sure, because it is not clear what it is and how the variable p looks.

  • Yes it works. Concerning the nested cycle I will look. Thank you - Ilya Mitusov
  • Tip - create a new question with a small sample data as text or CSV and the expected result. I am 99% sure that you can find a much more effective solution. - MaxU September