Python encoding error: readline () while reading the utf-8 file swears: 'charmap' codec can't decode byte

Question

I'm trying to read the ports file from IANA. It is stored in UTF-8 w / o BOM. But on one of the lines, the readline() function swears like this

'charmap' codec can't decode byte 0x98 in position 7938: character maps to <"undefined">

The line in the file looks like this:

# Jim Harlan <"jimh & infowest.com">

What a crutch to come up with for this? Or is there a direct solution?

UPD

For a crutch in the form of deleting this line will go (and she, for some reason, is one here), but only for the time of debugging, because then suddenly that partners will tear the hair on my head. Also lay out the code that I use for this operation:

 try: file = open(path, 'r') while True: line = file.readline() if(not line): break print(line) finally: file.close()

Accepted Answer · 2011-09-18T21:19:10

Try using the built-in codecs library:

 import codecs fileObj = codecs.open( "someFilePath", "r", "utf_8_sig" ) text = fileObj.read() # или читайте по строке fileObj.close()

rnd_d

2.165 13 silver marks 24 bronze marks

So the error was found even earlier:> 'charmap' codec can't encode characters in position 29-30: character maps to> <undefined> - Dex
Added some corrections to the question. - Dex
for utf-8 with BOM you need to change the encoding in open () to "utf_8_sig" - rnd_d
one
It can be said that 50/50. The problem with the first file was solved by deleting the unfortunate line. New file in a different format. Therefore, perhaps you are right, it was a random burst of joy. But your plus sign :) - Dex
one
Do not use codecs, which may not work correctly with the universal string mode. Instead, io.open() can be used. - jfs

|

Community spirit ♦ one · Answer 2 · 2016-11-16T22:06:42

To read a text file encoded using utf-8 encoding in Python, you can use the io.open() function, which is available as the built-in open() in Python 3 :

 #!/usr/bin/env python import io with io.open(path, encoding='utf-8') as file: for line in file: process(line)

If errors are possible in the file due to the encoding: the encoding itself is correct, but there may be minor errors, then you can pass an errors='ignore' error handler (or another value depending on the specific situation) .

Do not use codecs , which may not work correctly with the universal string mode.
You do not need to change your code page to cp65001 to read the utf-8 file.
If you want to print Unicode in the Windows console, then see. How can I output a Unicode string to a Windows console from Python?

Ali ali 2,109 7 silver marks 10 bronze marks · Answer 3 · 2011-09-19T08:26:19

 file = codecs.open(path, encoding='utf-8', mode='r')

Ali

2,109 7 silver marks 10 bronze marks

So tried already, did not work - Dex
'utf-8', not 'utf-8-sig' - Ali
I tried. Before that there was an answer with utf-8. - Dex

|

Community spirit ♦ one · Answer 4 · 2016-11-16T21:32:31

Constantly caught this error, time after time. The decision is seen here .

 import codecs file = codecs.open( "yourFile", "r", "utf-8" ) data = file.read() file .close()

chcp 65001 command line

These not complicated actions solved the problem.

I see no connection between reading a file and chcp 65001 .
Not to mention that chcp 65001 is a solution with defects that should be avoided in favor of solutions that use Unicode APIs such as win-unicode-console or PEP 528

Python encoding error: readline () while reading the utf-8 file swears: 'charmap' codec can't decode byte

4 answers 4

More articles: