New column in pd.DataFrame

Question

I have a DataFrame in which the first and second columns are text. Accordingly, I want to know the average word length for each value for one of the columns.

Wrote the following function:

def av_len(a,c): column_n = a.columns[c] + '_word_len' sum = 0 number = [] for i in range(a.shape[0]): df = p.findall(a.iloc[i][c]) for j in range(len(df)): sum += len((df)[j]) total = sum/len(df) number.append(total) sum = 0 a = a.assign(column_n = pd.Series(number)) return a[:5]

We submit DataFrame, we specify column number, we receive one more column with the necessary parameter. The problem is that the new column must have the name " source_column_word_len "

This I tried to do with:

 column_n = a.columns[c] + '_word_len'

and

 a = a.assign(column_n = pd.Series(number))

However, the name of the new column is still "column_n" . Tell me, how best to implement this idea?

It is necessary for me that for each separate column with the text I would create my own column with the measured parameter.

Can you give a small example of anonymous data (as text) and the desired resulting DF?
Are you looking for the average word length in a column for each line?
Can I somehow insert a result string from Jupyter Notebook here in a readable form?
You can try to save the data as CSV: df.head(5).to_csv(r'c:/temp/aaa.csv') and post the CSV in your question

MaxU MaxU 52.4k 6 18 51 · Accepted Answer · 2017-09-16T10:19:41

Change:

 a = a.assign(column_n = pd.Series(number))

on

 a[column_n] = number

BUT the nested loop is a very inefficient way. Surely you can use a vectorized approach. But without an example of input and output data - it is difficult to say for sure, because it is not clear what it is and how the variable p looks.

Tip - create a new question with a small sample data as text or CSV and the expected result.
I am 99% sure that you can find a much more effective solution.

New column in pd.DataFrame

1 answer 1

More articles: