There is a string use[1] = "строка \n string".encode('utf-8') translate the code into bytes and put it in the Pandas table and save the table in csv. In the byte code in order to be able to extract the string in the future as it contains the \ n character of the carry to a new line. If you leave with this sign, the table is not readable in the future. then I read the table train_dataset = np.genfromtxt('data', usecols=use[1:4], delimiter=';',dtype=object,skip_header=1) . I transfer the data from the code bytes to utf-8 and I got a string of bytes of type String but the bytes are listed there.

 for x in range(train_dataset.shape[0]): train_dataset[x][0]=train_dataset[x][0].decode('utf-8') train_dataset[x][1] = train_dataset[x][1].decode('utf-8') train_dataset[x][2] = train_dataset[x][2].decode('utf-8') print(train_dataset) 

"b'\\xd1\\x81\\xd1\\x82\\xd1\\x80\\xd0\\xbe\\xd0\\xba\\xd0\\xb0 \\n \\xd1\\x81\\xd1\\x82\\xd1\\x80\\xd0\\xb8\\xd0\\xbd\\xd0\\xb3'" result is a string type, how to convert to a type byte

    1 answer 1

    No need to invent anything - Pandas does a great job with line breaks:

     In [22]: df = pd.DataFrame({ ...: 'id': [1,2,3], ...: 'text': ['aaa', 'xxx\nyyy\nzzz', 'ccc'], ...: 'val': [10,20,30] ...: }) In [23]: df Out[23]: id text val 0 1 aaa 10 1 2 xxx\nyyy\nzzz 20 2 3 ccc 30 In [24]: print(df.loc[1, 'text']) xxx yyy zzz In [25]: df.to_csv('c:/temp/1.csv', index=False) In [26]: pd.read_csv('c:/temp/1.csv') Out[26]: id text val 0 1 aaa 10 1 2 xxx\nyyy\nzzz 20 2 3 ccc 30 

    CSV file - note that the line with line breaks is enclosed in double quotes, otherwise such a file will not be a valid CSV file:

     id,text,val 1,aaa,10 2,"xxx yyy zzz",20 3,ccc,30 

    UPDATE: how to read the CSV file in Numpy NDArray:

    use the DataFrame.values attribute:

     In [43]: pd.read_csv('c:/temp/1.csv').values Out[43]: array([[1, 'aaa', 10], [2, 'xxx\nyyy\nzzz', 20], [3, 'ccc', 30]], dtype=object) 

    Since version 0.24.0, there is a DataFrame.to_numpy () method in Pandas:

     In [44]: pd.read_csv('c:/temp/1.csv').to_numpy() Out[44]: array([[1, 'aaa', 10], [2, 'xxx\nyyy\nzzz', 20], [3, 'ccc', 30]], dtype=object) 
    • You use the Pandas table to save and load, but I need to load not the Pandas table, but from the csv file into the numpy array. To save to csv I use Pandas. I managed to do this by replacing \ n with another character using the replace method, but the question remained unresolved how to get bytes from the string that contains bytes (a string of the String type) (byte type). Excuse me. Thanks for the answer - Alexandr1234567890
    • @ Alexandr1234567890, this is exactly what I am doing at the end of the answer - the last line of code - MaxU
    • the pd.read (file) method returns a DataFrame, and the numpy array is needed - Alexandr1234567890
    • @ Alexandr1234567890, added the answer ... - MaxU
    • one
      Sorry this apparently no separator is specified. Updated pandas method appeared to_numpy all earned - Alexandr1234567890