Could not get data from python 3.5 csv file: UnicodeDecodeError: 'utf-8' codec can't decode byte

Question

Actually code:

csvfile = open("pr.csv", mode="r") spamreader = csv.reader(csvfile, delimiter=';') for row in spamreader: print(row)

File Contents:

a B C
fffff
vyfvvyv

Mistake:

 Traceback (most recent call last): File "/home/ti/PycharmProjects/parserwordstat/123.py", line 10, in <module> for row in spamreader: File "/usr/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

used the command

 open("pr.csv", encoding="utf-8").read()

everything also gives:

UnicodeDecodeError

the contents of the file itself:

 print(open("pr.csv", "rb").read())

b '\ xe0 \ xe1 \ xe2 \ r \ nfffff \ r \ n \ xe2 \ xfb \ xf4 \ xe2 \ xf4 \ xfb \ xe2 \ r \ n'

@jfs UnicodeDecodeError is what the code that you gave is: b'\xe0\xe1\xe2\r\nfffff\r\n\xe2\xfb\xf4\xe2\xf4\xfb\xe2\r\n' in common I have an idea about the encodings.

Accepted Answer · 2016-11-18T01:26:07

The file is encoded using cp1251 encoding, not utf-8:

 >>> b'\xe0\xe1\xe2\r\nfffff\r\n'.decode("cp1251") 'абв\r\nfffff\r\n'

You are probably trying to open a file created in Russian Windows (where the ANSI codepage is cp1251) in Unix, where utf-8 is often used for locale.getpreferredencoding(False) —the text encoding used by default when opening files using open() .

To read the csv file, pass the desired encoding explicitly:

 with open("pr.csv", encoding="cp1251", newline='') as file: ...

Could not get data from python 3.5 csv file: UnicodeDecodeError: 'utf-8' codec can't decode byte

1 answer 1

More articles: