Hello. There is a task to read data from a PDF file on Adnroid. Having tried several libraries, I stopped at iText (since only with it there were no additional problems). The file was considered and quite quickly. However, it was possible to count only the continuous text. Therefore, the question itself: is it possible to at least somehow consider structured data using this library (or some other)? The structure of the PDF file itself is as follows: Example

Those. the header goes, and then the data in the table cells, periodically separated by the date. Here is how reading data in a regular * .txt file now looks like.

 PdfReader reader = new PdfReader(path + "//files//timetable.pdf"); PdfReaderContentParser parser = new PdfReaderContentParser(reader); PrintWriter out = new PrintWriter(new FileOutputStream(path + "//files//result.txt")); TextExtractionStrategy strategy; for (int i = 1; i <= reader.getNumberOfPages(); i++) { strategy = parser.processContent(i, new SimpleTextExtractionStrategy()); out.println(strategy.getResultantText()); } reader.close(); out.flush(); out.close(); 

ZY I found the TabulaPDF library, but using it on Android is impossible because of the binding in awt. Thank you in advance for your response.

UPD1 While there was no solution with PDF, using the desktop version of TabulaPDF, through a graphical interface, translated PDF to CSV and easily read data from it using OpenCSV. Perhaps someone knows a way to do this conversion on the device itself?

    1 answer 1

    Unfortunately not, almost impossible. In very rare cases, a PDF can hold some kind of structured information about tables — if memory doesn’t change my memory, you can, for example, save the AI ​​file as a PDF, without killing the structure (text / tables), but as a rule pdf is rendered into print-ready vectors + text (therefore, it is difficult to edit and when copying from PDF you have to delete line breaks :-), so there is no general solution, only AI / OCR