The server directory contains files uploaded to the site by users. Often, users mistakenly download the same file many times. Names of these files are obtained, for example, such as: ustav.pdf, ustav_0.pdf, ustav_1.pdf There is no time to figure out which of these files is real.

I would like to go through the script so that all such files, if they are of the same length, are replaced by a symbolic link to one of them.

Tell me, please, how to write such a script?

  • you can run somewhere such a find . -type f -print0 | xargs -0 md5sum - | sort find . -type f -print0 | xargs -0 md5sum - | sort find . -type f -print0 | xargs -0 md5sum - | sort . Files that have the same amount md5, most likely the same. You can save the md5 amounts to a file, go through uniq with it, find duplicates, and then think about something. Maybe there are 3-4 duplicate files. - KoVadim
  • Links recommend doing all the same hard, not symbolic. Because otherwise it will be difficult to delete files. Come to delete the file, and on it the symlinks were and these symlinks would go nowhere. With hard links this will not happen. In general, it would be better to put the logic of checking identical files into a script that ensures the file is uploaded by the user than to constantly search through the entire folder after the fact on the duplicate - Mike

1 answer 1

it is better to replace not symbolic links ( symlinks ) , but “hard” ones ( hardlinks ) .

You can do this, for example, using the findup script included in the fslint package:

 $ /usr/share/fslint/fslint/findup -m каталог ... 

By the way, this package also has a fslint-gui gui wrapper.

  • And if not findup? Hosting does not allow to put packages ... - Anthony Pirozhenko
  • And if not findup? - Why not? A package is a set of scripts (there are a lot of all sorts of different functions implemented). Download, unzip, and use. or write your small script based on the implementation of this package. - aleksandr barakin