There is a code that reads data from a Word document:

public void getWordTokens(string wordFilePath) { Object fileName = wordFilePath; Word.Application app = new Word.Application(); var _tokenList = new List<string>(); try { app.Documents.Open(ref fileName); Word.Document document = app.ActiveDocument; Console.WriteLine("Words count - " + document.Words.Count); Console.WriteLine("Document name - " + document.Name); StringBuilder res = new StringBuilder(); for (int i = 1; i < document.Words.Count; i++) { res.Append(document.Words[i].Text); } Console.WriteLine(res.ToString()); } finally { app.Quit(); } } 

There are no questions in the work, on very small Word-files it works relatively quickly (in fact, not, but tolerant). But on large files - a nightmare begins - it works for a long time.

How can this whole thing be improved?

PS when the emergency program is closed, the Word process does not close, although it is finnaly surrounded, is this the norm?

  • And what is an emergency closure? Through the task manager? - VladD
  • @VladD yes, or task manager, or close in the debugger. The point is that a large document would have processed this code for a very long time (more than 10 min. Exactly, I did not have enough for longer) - Ep1demic
  • one
    Ah, well, then the norm. Hard kill process does not allow him to execute the block finally . The process is killed in a straightforward way. - VladD
  • @VladD thanks, but can you tell something on the main issue? Perhaps there is some other option to get all the text? - Ep1demic
  • Unfortunately, I had no business with Word, sorry. As an option, maybe reduce the number of hits? Try to get all the text in one fell swoop, and independently select words from it? - VladD

2 answers 2

do not need cycles. text of the document can be received immediately

  private string GrabWordFileWords(string file_name) { string result = null; // открыть файл object filename = file_name; object confirm_conversions = false; object read_only = true; object add_to_recent_files = false; object format = 0; object missing = System.Reflection.Missing.Value; try { WordApp._Document word_doc = _word_app.Documents.Open(ref filename, ref confirm_conversions, ref read_only, ref add_to_recent_files, ref missing, ref missing, ref missing, ref missing, ref missing, ref format, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); // получить текст из файла result = word_doc.Content.Text; object save_changes = false; word_doc.Close(ref save_changes, ref missing, ref missing); } catch (Exception ex) { } return result; } 

    You can do everything much faster if you do not use interop.

    I didn’t work with the Word ... But I’ll offer two quicker ways to go vskidku:

    1. Use the OpenXML library. (In your case, you need to convert the dock to the doc before this) (OpenXML is NOT intuitive enough, so you can search for the wrapper around it so that you can work easier. For Excel files, this is ClosedXML, which is not I know)
    2. You can resave doc in docx with .XML extension and regex to parse the necessary data. Not the best idea, but interop is much faster.

    in both cases, a document of almost any size will be sparsen in a split second.

    • Regular? - VladD
    • for most cases, it is fine. And there are not so many cases as shown there) - Andrew
    • It is better not to look inside the archive docx. There hell :) - Vladislav Khapin