I want to pull the text from .doc or .docx with the same font, size on Java. But so far I can’t even get the text out. I tried many options, but everywhere the answer comes out with cubes.

The following code is used:

package myconverter; import com.itextpdf.text.Anchor; import com.itextpdf.text.Document; import com.itextpdf.text.DocumentException; import com.itextpdf.text.Font; import com.itextpdf.text.FontFactory; import com.itextpdf.text.PageSize; import com.itextpdf.text.pdf.PdfWriter; import com.itextpdf.text.Paragraph; import com.itextpdf.text.pdf.CMYKColor; import java.io.BufferedInputStream; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.FileReader; import java.io.IOException; import java.io.InputStream; public class Myconverter { public static void main(String[] args) throws FileNotFoundException, DocumentException, IOException { String result = ""; String line; String doc = "C:\\samples\\HelloWorld.doc"; String pdf = doc.substring(0, doc.lastIndexOf('.') + 1) + "pdf"; Document document = new Document(PageSize.A4, 50, 50, 50, 50); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(pdf)); document.open(); Anchor anchorTarget = new Anchor(""); anchorTarget.setName("BackToTop"); Paragraph paragraph1 = new Paragraph(); // paragraph1.setSpacingBefore(50); paragraph1.add(anchorTarget); document.add(paragraph1); File file = new File(doc); try (FileReader fr = new FileReader(file); BufferedReader br = new BufferedReader(fr)) { while((line = br.readLine()) != null){ result = line; // System.out.println(result); //document.add(new Paragraph(result, FontFactory.getFont(FontFactory.COURIER, 14, Font.BOLD,new CMYKColor(0, 255, 0, 0)))); document.add(new Paragraph(result)); } } // System.out.println(result); document.close(); } } 
  • What does "with the same font, size" mean? In what format? rtf? - vp_arth
  • And where do you insert it? - Riĥard Brugekĥaim
  • I just make a word to pdf converter. and I wanted to first pull the text out of the Word, then assign it to a variable, and then write it in pdf - Jasmin
  • Tell me, then, what libraries do you use to read from the word and write to pdf, I’ll see what functions there are for setting the style. But yes, it is better if you show your code (exactly where you copy the text style). - Riĥard Brugekĥaim
  • @ RiĥardBrugekĥaim with style I haven’t done yet, I decided to just pull out the text only then ... - Jasmin

1 answer 1

Well, FileReader is not exactly intended for this, it is intended for simple text formats, Word is not.

You can do this with the help of POI, but you will have to carry a lot of pens there. I will not give you the ready code for this, but I will offer an easy option (although it is not friends with all Word formats): here is such a library that, among other things, can convert docx (xml version) to pdf. (This piece of text is not relevant to the example)

Here is an example of use:

 import java.io.File; import java.io.FileOutputStream; import java.io.OutputStream; import org.apache.poi.xwpf.converter.pdf.PdfConverter; import org.apache.poi.xwpf.converter.pdf.PdfOptions; import org.apache.poi.xwpf.usermodel.XWPFDocument; import fr.opensagres.xdocreport.samples.docx.converters.Data; public class ConvertDocxBigToPDF { public static void main( String[] args ) { long startTime = System.currentTimeMillis(); try { // 1) Load docx with POI XWPFDocument XWPFDocument document = new XWPFDocument( Data.class.getResourceAsStream( "DocxBig.docx" ) ); // 2) Convert POI XWPFDocument 2 PDF with iText File outFile = new File( "target/DocxBig.pdf" ); outFile.getParentFile().mkdirs(); OutputStream out = new FileOutputStream( outFile ); PdfOptions options = PdfOptions.create().fontEncoding( "windows-1251" ); PdfConverter.getInstance().convert( document, out, options ); } catch ( Throwable e ) { e.printStackTrace(); } System.out.println( "Generate DocxBig.pdf with " + ( System.currentTimeMillis() - startTime ) + " ms." ); } } 
  • Exactly windows-1250 , not windows-1251 ? - gil9red
  • Corrected. An example from the repository. - Riĥard Brugekĥaim
  • is it possible to learn more about the biblio import fr.opensagres.xdocreport.samples.docx.converters.Data ;? - Jasmine
  • Honestly, I myself did not work with her. Just in the archive of correspondence iText advised her. (there was another option, but it is paid) In fact, it is intended for generating reports, and besides this, it can also convert to PDF. Here only at random to understand. (there is no link to the answer) - Riĥard Brugekĥaim
  • @ RiĥardBrugekĥaim I found this code on the Internet by reference javased.com/index.php?source_dir=xdocreport-demo/samples/... Data turned out to be a separate class. But there it is empty. Could you advise what to write there? - Jasmine