The problem with the encoding of Russian-language characters extracted from PDF

Question

It is necessary to compare the occurrence of string characters received from the elements of a web page or from the sql query, with a report in PDF format.

The problem is as follows: Using the sql query, I get string variables, for example, query [0] [0] = 'Stick', which corresponds to the name of the object for which the report form is printed. In the reporting form itself, its name ('Stick') is displayed in the format of the form: ɉɚɥɤɚ.

As far as I understood from the topic at the link: https://stackoverflow.com/questions/22325228/how-get-russian-words-from-pdf-file to PDF, Russian-language characters are converted using gliphID

The question is: is it possible to transcode string characters in python into the presentation used in PDF, or, conversely, transcode from glyphID to utf-8. And if so, how? Unfortunately, my knowledge is not enough to use information from the topic.

I suspect that this is not exactly pdf, but its display in the browser.

The problem with the encoding of Russian-language characters extracted from PDF

0

More articles: