Jupyter-notebook very long cycle run

Question

import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(low=0, high=10, size=(1000000, 2)),columns=['a','b']) df[['a','b']] = df[['a','b']].astype(str) for i in range(2, df.shape[0]): df['b'][i] = df['a'][i-2] + ' ' + df['a'][i-1] + ' ' + df['a'][i] + ' ' + df['a'][i+1]

Turns the cycle very long. I have a data frame (1kk rows). I need to fill the string column "b" with elements from the string column "a", element by element. Such a run takes a lot of time (30-60 minutes). How to use a cycle in python?

I often encounter such a problem in jupyter. In my case, even a simple cycle

 for i in range(0,1000000): df['b'][i] = df['a'][i+1]

performed long.

Solved problem:
I have 30 words. The sentences from these words are written in the df ['a'] column in each cell one word. I want to find out the distribution of the sequence of these words (4 words per phrase), i.e. the sequence "mom" + "loves" + "strongly" + "waving" occurs much less than "mom" + "strongly" + "loves" + "waving", etc.

Actually, to solve this problem, df ['b'] is created

Answer
In jupitere, loops really work slowly with dataFrame [link] ( https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6 ). Everything is clearly described how to work with them, but I was too lazy to understand. In the end, I gash so

 df['b'] = df.a.shift[2] + df.a.shift[1] + df.a + df.a.shift[-1]

Thanks to all!

there is a suspicion that you are trying to implement a sliding window, but it was not clear why you are working with strings, not numbers ...
You can bring a more real data set (you can upload to any) file sharing.
What you are trying to do is already implemented in CountVectorizer - ngram_range parameter

Jupyter-notebook very long cycle run

0

More articles: