Is it true that you can not add something not to the end of the file without overwriting the file? What do, for example, standard XML parsers / XML editors (in any programming language) do if I insert a field value in the middle of a file? Overwrite the entire file or just what after the edited place?
4 answers
And true and false. Generally true. The file from the point of view of the application is a continuous piece of disk memory. Writing to the middle inevitably leads to the mashing of existing data. Usually, programs either overwrite the entire file, or from the point of change to the end.
However, there is a way to insert data into the middle of the file without overwriting it. As you know, the data on the disk is stored as blocks and the file is a collection of blocks, and somewhere a block map for a file can be stored, or each block contains a link to the next one. You can create a new block, write data to it and change the card by inserting a link to the block. But this approach has a drawback. Either we have to insert data in blocks of a certain size, or use only part of the block and put up with the loss of part of the disk capacity.
- oneThe "block" (cluster) method is used in many DBMSs (not all). XML does not support blocking. More blockiness supports biff wrapper (doc, xls and some other offices), in the new office blockiness was refused. The file specification either provides for a breakdown into addressable blocks (they are not addressable in mp3) or not. - nick_n_a
- 2fallocate (2) on Linux supports insertion / deletion without rewriting (with restrictions). Although, when working with xml, it is extremely likely that everything after the edited place is simply rewritten. To preserve the original or for large files in case of an error, a temporary file can be used where the new content is written and at the end if successful, it is renamed: edit text file using Python - jfs
- oneWith the same success can be rewritten and individual characters - if only they were exactly the same, how much is overwritten. - Harry
- 2@Harry for clarity: methods like
fallocate()do not just allow individual bytes to be rewritten, not copied at a place — this function allows you to change the length of the file (with insertion / deletion restrictions, including in the middle). - jfs
Almost any standard and non-standard tools do not work directly with the disk or the file system of the volume, but address the file requester with action requests. That one accordingly translates file operations into disk (and vice versa), initiating reading or writing of physical sectors.
If the file is completely read into the process memory and then written back completely, then the disk subsystem will stupidly execute everything requested and write the old information over the new one, despite the fact that it has not changed a bit (I’m talking about the file’s head to the point of change) - so cheaper than comparing what was, with what has become.
But if the process reads some "window" into which changes are made, and then this new state is flushed to the disk, then only the sectors in this window will be overwritten. And it is the application that, in order not to break the file, will be forced to organize the reading of the following "windows", the displacement of information in them and the recording of their new state - and so on until the end of the file. The process, as you understand, is expensive, but the memory is now cheap - because the vast majority of applications do not engage in such "crumbling" and read for the change, and after the change they write, the entire file is whole.
And in order not to run into problems, it is preferable not to overwrite, but to write a new file with renaming and / or deleting the old one. Finding free space for a file, if possible continuous, is also an inexpensive operation, so the new file is likely to be physically not at all in the place where the file was before adjustment. But at the same time, this technique is also a passive method of dealing with fragmentation.
Standard parls / XML editors read the file completely into RAM, and then also write it to disk completely. For a computer it is much easier and faster. If you work directly with the hard drive, it will be VERY slow.
- And if the file size is several gigabytes? - Harry
- @Harry - then xml- bad storage format - Chorkov
- @Harry If the file is several gigabytes, it is better to look for another format for storage. Alternatively, you can try to split the xml into several parts, then load into memory any headers or introductory words for each part. According to these words, find the part you need and load it into the RAM to work with it. At the very end, merge files into one big XML. Although this is also not a very good option in terms of speed, but at least some solution. - evilnw
- @evilnw, and you can link to some standard dock, where it says? - G0ohan
- one@ G0ohan, unfortunately, I can't even find it. Once upon a time I read an article where in a similar way they tried to circumvent the limit on the amount of RAM, but everything is much worse there. There was a device with only 5 MB of RAM and it was necessary to parse different tables and texts, which could take 10-20 MB each. The main file was broken into a bunch of small ones and introductory words were hung on each such file, and you could quickly find the necessary part. Then the program opened this small file and got \ recorded the necessary values. At the end, all the files were collected in one. - evilnw
Yeah, right.
Depends on the implementation. I fully admit that the file is completely read into memory, processed and completely flushed to disk.