In general, the problem is this, there is a data frame containing data of the form:
Id Sequence 3 1,3,13... 7 1,2,1,... 8 1,2,4,... 11 1,8,25... 13 1,111,.. Where id is the sequence number, sequence is the sequence itself. The task, for example, take the first sequence and arrange it in a column and so on with all. The number of elements in the sequence is different everywhere.
I do this:
#Импортируем необходимые пакеты import sys import warnings import pandas as pd import numpy as np from sklearn.metrics import mean_absolute_error, mean_squared_error import statsmodels.formula.api as smf import statsmodels.tsa.api as smt import statsmodels.api as sm import scipy.stats as scs from scipy.optimize import minimize import matplotlib.pyplot as plt #читаем данные и выводим dftrain = pd.read_csv('../../data/IntegerSeqTrain.csv', sep=",", index_col=['Id']) dftrain.head(10) It will return: (the fact that the sequence above id already confuses me)
Sequence Id 3 1,3,13... 7 1,2,1,... 8 1,2,4,... 11 1,8,25... 13 1,111,... Further, the actual partitioning itself.
#для удобства пишем последовательности в столбец, предварительно разбив по запятой print(dftrain.shape[1]) i=0 for dfitem in dftrain: j=0 for dfitem2 in dfitem: dftrain[j] = dftrain['Sequence'].str.split(',').str.get(j) j+=1 i+=1 #удаляем лишний столбец dftrain = dftrain.drop('Sequence', 1) #pd.set_option('max_colwidth', 10) #что получилось print(dftrain.head(10)) At the exit:
0 1 2 3 4 5 6 7 Id 3 1 3 13 87 1053 28576 2141733 508147108 7 1 2 1 5 5 1 11 16 8 1 2 4 5 8 10 16 20 11 1 8 25 83 274 2275 132224 1060067 13 1 111 12211 1343211 147753211 162528... 178781... 196659... 15 1 1 1 1 1 1 1 1 16 840 1320 1680 2520 3192 3432 4920 5208 - Everything is written to the row, not to the column.
- The number of columns has been greatly reduced (8-9 pieces, although there are actually many more)
How can the data be beautifully divided and presented as columns?) Thanks in advance The data itself (train.csv): https://dropmefiles.com/osxrI
Source (IPYNB file): https://dropmefiles.com/cIR4f