Trying to get all the text from a .docx document.

using (var wordDocument = WordprocessingDocument.Open(fileName as string, false)) { //получаем весь текст var text = wordDocument.MainDocumentPart.Document.Body.InnerText; Console.WriteLine(rawText); } 

The whole text is actually obtained and written into a variable, but, it turns out to be unformatted and we get something like this at the output: enter image description here

Although in the word file itself:

enter image description here

I assumed that when receiving the text, it would be at least taking into account the translation of the line, but it turned out to be not so simple.

What are the options to save the line feed?

    1 answer 1

    Try this way:

     public string GetPlainText(OpenXmlElement element) { StringBuilder text = new StringBuilder(); foreach (OpenXmlElement section in element.Elements()) { switch (section.LocalName) { // Text case "t": PlainTextInWord.Append(section.InnerText); break; case "cr": // Carriage return case "br": // Page break PlainTextInWord.Append(Environment.NewLine); break; // Tab case "tab": PlainTextInWord.Append("\t"); break; // Paragraph case "p": PlainTextInWord.Append(GetPlainText(section)); PlainTextInWord.AppendLine(Environment.NewLine); break; default: PlainTextInWord.Append(GetPlainText(section)); break; } } return text.ToString(); } var text = GetPlainText(wordDocument.MainDocumentPart.Document.Body); Console.WriteLine(text); 

    A source