To extract text from a PDF file, I used the PDFMiner module. As a result, in the console itself, when executing the code, the text from the PDF itself appears, but it cannot be written down after the module is executed. A single word, None appears in the file.

How to write the text recognized by this module to a file?

Here is the code:

 import sys import pdfminer.high_level # $ pip install pdfminer.six # Сюда в качестве первого параметра передаётся название файла для распознавания with open('mail_cir.pdf', 'rb') as file: # Собственно работа самого модуля. Переменной текст присваиваю результат # работы модуля и сразу перевожу этот результат в текстовый формат (str) text = str (pdfminer.high_level.extract_text_to_fp(file, sys.stdout)) file2 = open("Текст.txt", "w") # Создаю файл с именем "Текст.txt" file2.write(text) # Записываю в файл результат 

TOTAL in the file the word None . And there must be all recognized text from PDF.

  • Everything is working. How can I thank you on this portal? I haven’t found the opportunity to evaluate the answer, nor the opportunity to like you here, for example. - Parsing_teh

1 answer 1

Remove the str() following lines. Replace sys.stdout with file2