Algorithm for synchronizing the list of files on disk and in the database

Question

When the application starts, a list of folders in the files will be displayed. The process of scanning these folders takes a little time, so I decided to save the names and paths in these files and folders in the database. After that, the startup process was greatly accelerated. But what if the user decides to delete / change or add a folder or file. It is also necessary to preserve the order of files as on the device. How would you solve this question?

To track the removal / addition, some languages have an event model.
And when scanning a device, the files will be processed in the same order in which they are located, and only then the user chooses the order in which to display them
"keep the order of files as on the device" - and what is the order on the device and what is it now?
Add to the question the code where you read the files and add to the database.
Then the user deleted file3 and added file 4. What should be the algorithm to delete file3 from the database and write file4?
On very old systems with direct access to the body of a disk or tape, it was still possible to pervert, and then not always, and it is not necessary.
The file system is responsible for the location of files on the media, which simply gives you a list of files, and in what order you will display their names is your business, the file system and, all the more, the device is not affected.
So your task can be reduced to the synchronization of file system and database data.

Answer 1 · 2017-09-24T23:23:05

Well, the algorithm is so algorithm.

Get the list of files from the file system FSFiles , sort in alphabetical order
We DBFiles list of files from DBFiles database, sorted alphabetically
Let's say got something like this:
```
 N | FSFiles | DBFiles --|---------|-------- 1 | file1 | file1 2 | file2 | file3 3 | file4 | file4 
```
In one cycle with two counters, we review the lists until both counters reach the end.
3.1. If FSFiles[i] = DBFiles[j] , then increment both counters by one and repeat, otherwise 3.2.
3.2. If FSFiles[i] < DBFiles[j] (alphabetical comparison), then add FSFiles[i] to the database, increment i and return to 3.1., Otherwise 3.3.
3.3. If FSFiles[i] > DBFiles[j] , then remove DBFiles[j] from the database, increment j and return to 3.1.
If, in the process of viewing, one of the counters has already reached the end of the file list, then either we finish FSFiles records in the database remaining in FSFiles , or delete everything that is left in the DBFiles from the database, depending on which of them has ended faster.

The algorithm complexity (without pre-sorting of lists) is linear, proportional to the length of FSFiles U DBFiles .

In some systems, it is possible to listen to file system monitoring events (for example, auditing filesystem objects in Windows, though it hurts too much and logs are quickly clogged, but there are)

Some libraries and frameworks allow you to create and use your own (non-system) monitoring (for example, FileSystemWatcher in classic .NET, it has nothing to do with system audit, but, naturally, it listens to the same Win API events as audit, only logs are not cluttering) .

If there is an opportunity to use any kind of event monitoring, then it can be used as the main source of change data, and the algorithm described above can be run at a small frequency to correct monitoring “blunders” that are guaranteed to be with intensive work with files.

Algorithm for synchronizing the list of files on disk and in the database

1 answer 1

More articles: