I have a data set with the column " UserID ", ' System ' - the system that is used by the user and the concatenation of these two columns. Here is an example of the data :
>>RolCatBR_IDMqes1.loc[0:11] UserID System CONCAT A 0 ANTANAS P1B_010, P2Z_010 P1B_010|ANTANAS 1 AWYGASC P1B_010, P2Z_010 P1B_010|AWYGASC 2 CHENQIA P1B_010, P2Z_010 P1B_010|CHENQIA 3 CHENQIA P3Z_020, P3Z_030 P3Z_020|CHENQIA 4 DBORZUT P1B_010, P2Z_010 P1B_010|DBORZUT 5 DURAKER P1B_010, P2Z_010 P1B_010|DURAKER 6 JEBINDE P1B_010, P2Z_010 P1B_010|JEBINDE 7 SMETTAN P1B_010, P2Z_010 P1B_010|SMETTAN 8 TKAUL13 P3Z_020, P3Z_030 P3Z_020|TKAUL13 9 VATERCH P3Z_020, P3Z_030 P3Z_020|VATERCH 10 ABUNNEN P2Z_010 P2Z_010|ABUNNEN 11 AMILSKI P2Z_010 P2Z_010|AMILSKI For example: the first line is [0] , I need to extract the data about the system - P2Z_010 , create a new line with the same UserID and put the system information - P2Z_010 with the updated CONCAT A
To get it:
UserID System CONCAT A 0 ANTANAS P1B_010 P1B_010|ANTANAS 0.5 ANTANAS P2Z_010 P2Z_010|ANTANAS 1 AWYGASC P1B_010 P1B_010|AWYGASC 1.5 AWYGASC P2Z_010 P2Z_010|AWYGASC ... I tried to apply the method suggested by @Wen:
s2 = RolCatBR_IDMqes1['System'].str.split(',') w2 = pd.DataFrame({ 'UserID':RolCatBR_IDMqes1['UserID'].repeat(s2.str.len().fillna(value=0).astype(int)), 'System':sum(s2.tolist(),[]), 'CONCATA':RolCatBR_IDMqes1['CONCATA'].repeat(s2.str.len().fillna(value=0).astype(int)) }) But I get an error and I do not know how to fix it:
File "<ipython-input-93-42d2e6fcce42>", line 1, in <module> sum(s2.tolist(),[]) TypeError: can only concatenate list (not "float") to list How can I extract information from a variable cell and put it in a duplicate string? Or correct the error so that the method works?
Sample data for error reproduction:
df1 = df.iloc[1130:1140] df1 Out[79]: UserID System CONCAT A 1130 NaN NaN NaN 1131 AYNERDO P1B_010 P1B_010|AYNERDO 1132 CKIESCH P1B_010 P1B_010|CKIESCH 1133 JBRETTS P1B_010 P1B_010|JBRETTS 1134 YASSMAN P1B_010 P1B_010|YASSMAN 1135 EPFITZE P1B_010 P1B_010|EPFITZE 1136 NaN NaN NaN 1137 HUBBARA P1B_010 P1B_010|HUBBARA 1138 TQUINTO P1B_010 P1B_010|TQUINTO 1139 NaN NaN NaN
list(df1) Out[80]: ['UserID', 'System', 'CONCAT A']
Then I set the function in the system. And then I execute the code res = explode(df.assign(System=df['System'].str.split(',\s*', expand=False)), ['System'])
line 42, in _wrapit result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'
I think I understand why this is happening, maybe it's in NaN. Now try to replace them with zero, or better to remove?