For example, a 250G disk, a 200G file on it, which should be split into two 100G files. The file system - whatever - any options are interesting, as it can be done.

  • one
    you can split the disc)) - Eugene Bartosh
  • @Mike, any FS, any OS, any tools. - user239133 pm

3 answers 3

If your file system supports sparse files, then the general algorithm is as follows: open a new file, move the file pointer to the place where the last N megabytes of the file are located, rewrite these N megabytes. We close the source file, truncate it by the size of the copied part (system call truncate () / ftruncate ()). Again, open the file, rewrite the penultimate N megabytes (by correctly setting the file pointer in the destination file). Repeat until we rewrite everything. It is recommended to write in large parts, so that there is no strong defragmentation in the file system and parts that are multiples of the file system block size (usually 4kb). With this approach, we will rewrite each part of the file only once and immediately to its place, since sparse files occupy as much space on the disk as they are written, regardless of the offsets in which they were written, i.e. when recording the earlier part, the FS adds blocks at its level to the beginning of the chain occupied by the file.

On unix / linux operating systems, these operations can be done from the command line using the dd utilities to write a portion of the file at the required offset and truncate to truncate the source file to the required length. Be careful and careful, I warned :)

Like that:

 dd if=файл1 of=файл2 bs=1M count=51200 seek=51200 skip=51200 truncate -s 161061273600 файл1 dd if=файл1 of=файл2 bs=1M count=51200 seek=0 skip=0 

    I understand correctly that it is necessary to divide files into several parts and then, through some toolkit, to get transparent access to them, as to ordinary files? Those. ideally, the user does not even realize that the file is located on several disks, right?

    RAID-like systems, apparently, we will not consider. I have only a couple of ideas to implement this:

    1. Just write a function that will read pieces of files located on different disks in sequence. File data is stored in a text file. For splitting, too, its own function, which just divides the file, writes to different disks and updates the text file. Pros: the simplest implementation. Cons: use only in their programs, third-party users can not.
    2. Interception of system functions of working with files. You can respond to a specific extension, you can on the file name. The interceptor needs to know the exact sizes of the files stored on different disks, and supply the data, depending on what offset from the beginning of the file is being read. Write such files to do specials. a utility that divides files and writes sharing parameters to a data file that is regularly read by an interceptor. Pros: it's not very difficult to implement, third-party programs can also read such files transparently.
    3. Own file system. Rewrite the same FAT32, allowing the file allocation table to refer not only to clusters of the current disk, but also to clusters of other disks (or even clusters, but simply to files of other disks). The idea is good because it allows you not to change the FS of other disks, as long as the beginning of the file is located on this disk. Pros: third-party programs can also read such files transparently. Disadvantages: difficult to implement, you have to write your driver (I don’t know if there are any FAT32 driver sources on the network).
    • If you look at the question from this side, although it seems to me differently. That can be done in linux using standard tools. With the help of losetup we make files with devices, mdadm create raid 0 from these devices, work with device / dev / mdX as with a file. Plus, there are all kinds of managers who can also be fed files as devices - Mike

    In theory, the file can be divided into blocks. For example:

    250 (free space) - 200 (file size) = 50 gigs per operation.

    Then we take 40 (took a smaller one) from the first and write to the second. Then we rewrite the first one. We get two files 160 and 40.

    Then there are two more iterations and the files are divided:

    1. 120 and 80
    2. 100 and 100
    • I suspect that this is not exactly what the vehicle needs. - Viktor Tomilov
    • @Viktor Tomilov, I understand the theoretical question. The phrase "File system - whatever" adds even more theoreticality to this question, so I think the answer is correct. - Sergey Ignakhin