I need to translate the docx file (there are tables, pictures, etc.) into html and display on the page. Found 3 more or less normal libraries: doc4j, jodconverter, apache poi. To convert just the entire file in html, I found how.

My problem is that I can not come up with what criteria to split the file. Maybe you can somehow find out the height / width of the html part when converting. Or somehow, when separating to emulate a screen, and as the text goes beyond the limit, take this part. And there is a problem with the tables, if they are very large, or even a table in the table. Divide docx into parts, and then convert to html, or convert the entire docx and then divide the html into parts, the code does not matter, all ideas are welcome.

Everything will be stored in the database (and text and html) to load the data in parts when scrolling, but not immediately. Some should be about the size of the screen, maybe a little bit more. To make it easier, let's say the resolution is 1024,786.

Help a normal idea or how this can be done, I have not enough experience and knowledge for this. And if there is also a sample code, it will be generally fine. If somewhere incomprehensibly written or confused, ask. thank

  • 3
    Break the text into semantic parts, paragraphs, please. Impossible to read) - Kromster
  • The document itself is divided into pages. Isn't that enough for you? If not, then apparently the data from the file will be presented on the site not as a document? Then separate it with your hands, just like that. - iksuy
  • @iksuy The document into pages is not divided inside. What hands to divide is understandable. And by what criterion? so that each part of the height in the browser would look about the same? - Elizar

0