I ask for help with the following problem: when using pandas.read_csv data lines are lost. My actions are step by step:

# читаем данные в датафрейм import pandas as pan d = pan.read_csv('card.txt', sep = '|', encoding = 'cp1251') print len(d) 

Check the result:

 #читаем содержимое файла import codecs fh = codecs.open('card.txt','r', encoding = 'cp1251') text = list() #делим на строки for line in fh: line = line.rstrip() text.append(line) print len(text) 

We get len ​​(text) - len (d) = 214000

What could be the problem? I would be very grateful for the help.

  • If the file does not have a header, then you can specify header = None when reading csv. But most likely an error in the data, if you add error_bad_lines = False to read_csv, then you can see the lines that were not read due to errors. Most likely, in some lines the number of elements does not correspond to the length of the title. - annndrey

0