Hello! I am trying to read a .docx file using the Apache POI java API. I use:
public static String view(String nameDoc){ String text = null; try{ XWPFDocument docx = new XWPFDocument( new FileInputStream(nameDoc)); XWPFWordExtractor we = new XWPFWordExtractor(docx); text = we.getText(); we.close(); docx.close(); }catch (Exception e){ e.printStackTrace(); } return text; } In this case, I only get the text of the file, but all my files are different. In some of them not only text is found, but also tables, images, etc. How do I get the full file content?
On Max's advice, I use wordtohtmlconverter
public static String getDocHtml(String nameDoc){ String html = null; try { Document newDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument); HWPFDocument doc = new HWPFDocument(new FileInputStream(nameDoc)); wordToHtmlConverter.processDocument(doc); StringWriter stringWriter = new StringWriter(); Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); transformer.setOutputProperty(OutputKeys.METHOD, "html"); transformer.transform( new DOMSource(wordToHtmlConverter.getDocument()), new StreamResult(stringWriter)); html = stringWriter.toString(); }catch (Exception e){ e.printStackTrace(); } return html; } I send in jsp, but on the page I receive nothing. Error: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)