Good day. There was a contradiction about the class, which allows you to view files in a text box.

This class is a File class? Or is it a filestream? Using File, I managed to open the file with the ReadAllText method, but the FileStream does not have similar methods and I assume that it is clearly not for opening files in anything.

The second question is how to correctly open documents such as docx, so that its content has a familiar look (as when opening a textbox, for example, txt files), rather than being encoded with characters.

  • one
    Try digging into the WordprocessingDocument. msdn.microsoft.com/ru-ru/library/office/ff478255.aspx - And
  • 2
    File and FileStream - classes that provide access to the contents of the file, the display is not included in their functions. And about the processing of Word files, there are different options, interop, OOXML, etc. And if for money, then you can see SyncFusion and DevExpress, they seemed to have ready controls for that. - rdorn

2 answers 2

The File class contains a set of functions to simplify the execution of typical operations on files. For example, the File.ReadAllText function opens a file, if possible, determines the encoding used when saving text to this file, reads its contents by translating it into text, closes the file, and returns text. All these actions could be encoded manually if there were no ReadAllText function.

The FileStream class is a file that is open for reading or writing and provides much more possibilities for working with a file than the File class. This work must be coded manually. For example, this is the code that does the same work as ReadAllText.

public static string ReadAllText(string filePath) { using (FileStream stream = File.OpenRead(filePath)) { var encoding = new UTF8Encoding(true); var reader = new StreamReader(stream, encoding); return reader.ReadToEnd(); } } 

This approach is useful when you need to do more than just read all the text from a file.

As for the * .docx files, they do not contain text at all, but a document containing, in addition to text, also formatting markup, images, tables, and so on. In order to save such a document to a file, a proprietary format is used.

Microsoft has published a format specification for all documents in the Microsoft Office line, and you can download it from this page: Technical Documents . But the DOC format specification alone is a PDF file of almost 20 megabytes. Therefore, in order to read the text from the * .docx file, it is better to use special libraries, such as those advised in the And and rdorn comments :

Example of using Word Processing (Open XML SDK)

Microsoft.Office.Interop usage example

Description of RichEditDocumentService component from DevExpress

Syncfusion WordDocument Feature Description

  • I heard that small-scale laid out in open access documentation on this format, they wanted to standardize it in ISO. True, people who tried to write something using this documentation had complaints about its quality due to a large number of “white spots” and inaccuracies. - Bulson
  • @Bulson, you are absolutely right. I even managed to find this documentation. Looks like this is what you're talking about: Office File Formats . I added this to my answer. - Uranus

1) FileStream as well as IsolatedStorageStream and MemoryStream and NetworkStream belong to the so-called "Backing store streams", i.e. work with information as a sequence of bytes tied to a specific goal: the file system or process memory or a network resource. To work with a specific data format, i.e. when you need to make some kind of meaningfulness into this sequence of bytes, it is "wrapped" into the so-called "Stream adapter", which includes: want text ( StreamReader , StreamWriter ), want basic types ( int,float,string ) then ( BinaryReader , BinaryWriter ), we want XML ( XmlReader , XmlWriter ).

2) You can take any file type .docx and change its extension to .zip, then you can look under the hood of this format. And you will see that it consists of several directories with attached files in xml format. Hence the conclusion: either you are writing your own library for working with this proprietary format, or, as @rdorn suggests in the comments, you buy a ready-made library for working with this format.