Guys, welcome. There is a task to calculate how much disk space with the FAT32 file system will be occupied by the folder with files. I know the size of the cluster file system and this allows you to simply calculate the size needed to host all the files:

do { if (fileinfo.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { if (wcscmp(fileinfo.cFileName, L".") !=0 && wcscmp(fileinfo.cFileName, L"..") != 0) { StringW path2; path2.Format(L"%s\\%s",path, fileinfo.cFileName); size += CalculateFatSize(path2, files, dirs); } } else { fileSize = ((__int64)fileinfo.nFileSizeHigh << 32) | fileinfo.nFileSizeLow; unsigned int clustersInFiles = fileSize / clusterSize + (fileSize % clusterSize == 0 ? 0 : 1); size += (fileSize < clusterSize) ? clusterSize : clustersInFiles * clusterSize; } } while(FindNextFileW( hFile, &fileinfo ) != 0 ); 

But as far as I know, for each file in the FAT table there is a record that contains information about the clusters that contain the data of the file, etc. Moreover, the size of this record is not fixed in the case of a long file name (not sure that in the case of a short file name, this size is fixed). That is, it turns out that when counting, you still need to somehow count the size of the FAT table entry for each file. Actually, the question is, how to determine this size?

Thank.

  • Do you have the size of directory entries and files in it or the actual size occupied by the files? - Vladimir Martyanov
  • Vladimir, I am interested in the size of the directory and the files in it. It seems like the actual size occupied by the files is not difficult to calculate. - rudolfninja

1 answer 1

I would like to say in advance that I am talking about FAT32, even if I am talking about FAT.


The FAT table, located at the beginning of the disk , contains only general information about the disk and the table of "cluster connectivity", where for each cluster it is said which cluster in its chain is next. This table is allocated in advance and has a constant size (for a given volume size and cluster size on it). It is always on the disk and its place is always "busy". Therefore, to consider the cost of a place in it in something, probably, it makes no sense.

Next folder . These are regular files , with a special attribute , in which the data of a special structure lie, 32 bytes per record . For especially long names of records on one file there can be several (LFN mechanism). But all these records are stored in directory files. If you can get exactly the size of the file directory with some low-level tools, then you will not need more.

The space occupied will thus be the sum of the sizes of all {folder files} and the files indicated by it, recursively.


For example, for the structure:

 folder |- 000000 | ... | ... |- 999999 

... where in files 000000 - 999999 (a million files in total) there are 6 data bytes each (their names) the data will be broken up as follows:

  • (1 000 000 clusters) One cluster for the contents of each file from 000000 - 999999 - the consumed real disk space depends on the cluster size, obviously.
    • Only the contents of the files . About the data on the files will be next.
    • 6 bytes is obviously less than any cluster found in FAT32, so there will be only one cluster each.
  • (32 000 000 bytes) File data is stored in 32-byte entries in the folder file with the directory attribute. As its contents. Just in a row. The file system mechanisms may not allow it to be opened simply as a file, but inside it is a binary file.

    • Real consumed disk space:

       размер_кластера * округлить_вверх(32_000_000 / размер_кластера) 
  • (0 bytes in the used space ) The above file does not fit into one cluster. Therefore, when reading it, it will be necessary to look at the FAT table (at the beginning of the disk) for the cluster number in which the sequel is stored.

    • But since this table is allocated in advance even when building a file system, taking into account its volume makes little sense.
  • (32 bytes, maybe ) Information about the folder is stored in its parent folder among the data on other files / folders (FAT, if you have not guessed it, uses the same mechanisms for storing them) next to the folder .
    • Needed for the number of the first cluster of the file list.
    • ... but if it is a disk root, then the corresponding number is stored in the disk header , which is part of the file system's service structures, and it makes no sense to take it into account. And the corresponding 32-byte record in this case does not exist.
  • submit folder contains files named in order: 000000..999999, which contain only your name (6 bytes in ascii). How much disk space can this folder with a million files require (approximately)? - jfs
  • @jfs following my logic, a folder of 32 million bytes + ceil to a whole number of clusters. This is if there are no tricks with LFN, and with such names they should not be. Plus files, cluster by piece. - D-side
  • Not understood. Could you please update the answer and describe in detail what bytes where are written to the disk in this case? For example, explain why the answer is not ~ 4K * 1000_000? (if the cluster size is 4K) - jfs
  • @jfs 4k*1_000_000 + 4k * ceil(32_000_000 / 4k) [+ 32] . He gave an example in response. - D-side
  • D-side, thanks for the clarification. That is, as I understand it, an additional 32 bytes are allocated for each file with a short name for the data about the file. And what about the situation with a long file or folder name? - rudolfninja