I have a text editor ( tinymce ) where the user is editing posts. The result is an html code. For example:

<table><tr><td>one</td><td>two</td></tr></table><p>Some text</p> 

The user clicks on the button - "Generate a document."

This template should be inserted into the docx file.

For this there is a placeholder in the document - [template] .

Read more about this here: Word document. Replacing tags

Question: how to insert a table and a paragraph into docx instead of html code?

There is such code:

 using System.IO; using DocumentFormat.OpenXml; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; using NotesFor.HtmlToOpenXml; ... static void Main(string[] args) { const string filename = "test.docx"; string html = Properties.Resources.DemoHtml; if (File.Exists(filename)) File.Delete(filename); using (MemoryStream generatedDocument = new MemoryStream()) { using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document)) { MainDocumentPart mainPart = package.MainDocumentPart; if (mainPart == null) { mainPart = package.AddMainDocumentPart(); new Document(new Body()).Save(mainPart); } HtmlConverter converter = new HtmlConverter(mainPart); converter.ParseHtml(html); mainPart.Document.Save(); } File.WriteAllBytes(filename, generatedDocument.ToArray()); } System.Diagnostics.Process.Start(filename); } 

It forms a docx file from a resource, I need to insert html into an existing file.

  • You need to insert the html markup with the contents in the docx document, do I understand correctly? - Vladislav Khapin
  • Yes everything is correct. - endovitskiiy

1 answer 1

This is done using AltChunk and AlternativeFormatImportPart. More or less it is described on the MSDN article.

 using System.IO; using System.Reflection; using System.Text; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; namespace TestC { class Program { static void Main(string[] args) { using (var document = WordprocessingDocument.Open(@"C:\Users\User\Documents\sample.docx", isEditable: true)) //я вынес файл html как ресурс сборки отдельно, это не принципиально using (var htmlStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("TestC.Sample.html")) { var mainDocumentPart = document.MainDocumentPart; var html = new StreamReader(htmlStream).ReadToEnd(); //текст html var htmlAsUtf8Bytes = Encoding.UTF8.GetBytes(html); using (MemoryStream htmlContentStream = new MemoryStream(htmlAsUtf8Bytes)) { string partId = "id"; AlternativeFormatImportPart formatImportPart = mainDocumentPart.AddAlternativeFormatImportPart( AlternativeFormatImportPartType.Html, partId); formatImportPart.FeedData(htmlContentStream); AltChunk altChunk = new AltChunk(); altChunk.Id = partId; mainDocumentPart.Document.Body.Append(altChunk); } } } } } 

Where Sample.html (taken from here ):

 <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com </a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> 

At the exit: enter image description here

For your example:

 <HTML> <head> <style> .table { width: 100%; border: 1px solid; border-collapse: collapse; } .table td { border: 1px solid black; } </style> </head> <BODY> <table class="table"> <tr> <td>one</td> <td>two</td> </tr> </table> <p>Some text</p> </BODY> </HTML> 

At the exit:

enter image description here


UPD Replace all paragraphs that consist only of the text [Html] in our HTML

 using System; using System.IO; using System.Linq; using System.Reflection; using System.Text; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; namespace Test { class Program { static void Main(string[] args) { using (var document = WordprocessingDocument.Open(@".docx file", isEditable: true)) //я вынес файл html как ресурс сборки отдельно, это не принципиально using (var htmlStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("Test.Sample.html")) { var mainDocumentPart = document.MainDocumentPart; var documentBody = mainDocumentPart.Document.Body; var html = new StreamReader(htmlStream).ReadToEnd(); var htmlAsUtf8Bytes = Encoding.UTF8.GetBytes(html); Random random = new Random(); var paragraphsToReplace = documentBody.Descendants<Paragraph>().Where(x => x.InnerText.Equals("[Html]")).ToList(); foreach (var paragraph in paragraphsToReplace) { string partId = $"id_{random.Next()}"; AlternativeFormatImportPart formatImportPart = mainDocumentPart.AddAlternativeFormatImportPart( AlternativeFormatImportPartType.Html, partId); using (MemoryStream htmlContentStream = new MemoryStream(htmlAsUtf8Bytes)) { formatImportPart.FeedData(htmlContentStream); } AltChunk altChunk = new AltChunk(); altChunk.Id = partId; paragraph.InsertBeforeSelf(altChunk); paragraph.Remove(); } } } } } 
  • Thank you for what you need. - endovitskiiy
  • I need the [template] label to match the altChunk markup id. - endovitskiiy
  • @endovitskiiy right now I can’t write in detail, but AltChunk is on the same level as the Paragraph, i.e. you will need to find the Paragraph with the label, call InserBeforeSelf (), transfer the AltChunk object from the example (instead of the mainDocumentPart.Document.Body.Append ()) and then delete the Paragraph object with the label through its Remove () method. It turns out that you have replaced the label. I hope from memory correctly painted. - Vladislav Khapin
  • Well, the meaning is clear, I try to do it. - endovitskiiy
  • I understand that to remove the entire paragraph with a label, and not the label itself - endovitskiiy