The File class contains a set of functions to simplify the execution of typical operations on files. For example, the File.ReadAllText function opens a file, if possible, determines the encoding used when saving text to this file, reads its contents by translating it into text, closes the file, and returns text. All these actions could be encoded manually if there were no ReadAllText function.
The FileStream class is a file that is open for reading or writing and provides much more possibilities for working with a file than the File class. This work must be coded manually. For example, this is the code that does the same work as ReadAllText.
public static string ReadAllText(string filePath) { using (FileStream stream = File.OpenRead(filePath)) { var encoding = new UTF8Encoding(true); var reader = new StreamReader(stream, encoding); return reader.ReadToEnd(); } }
This approach is useful when you need to do more than just read all the text from a file.
As for the * .docx files, they do not contain text at all, but a document containing, in addition to text, also formatting markup, images, tables, and so on. In order to save such a document to a file, a proprietary format is used.
Microsoft has published a format specification for all documents in the Microsoft Office line, and you can download it from this page: Technical Documents . But the DOC format specification alone is a PDF file of almost 20 megabytes. Therefore, in order to read the text from the * .docx file, it is better to use special libraries, such as those advised in the And and rdorn comments :
Example of using Word Processing (Open XML SDK)
Microsoft.Office.Interop usage example
Description of RichEditDocumentService component from DevExpress
Syncfusion WordDocument Feature Description