Which tool can combine several dictionaries into one file and then sort and clear from duplicates?
100+ txt, dic, doc - dictionaries, more than 300 GB
The tool requires UTF-8 support and does not remove spaces at the end of the line.
Which tool can combine several dictionaries into one file and then sort and clear from duplicates?
100+ txt, dic, doc - dictionaries, more than 300 GB
The tool requires UTF-8 support and does not remove spaces at the end of the line.
if these are text files, then the sort program is enough:
$ sort -u файл(ы) > результат the -u option - “remove duplicates” (“leave only unique strings”).
about the required resources - you can read the answers to this question: How could the UNIX sort command be a very large file?
briefly: an external sort is used (using the n-way merge method), which means that the file system where the temporary directory is ( $TMPDIR or /tmp or explicitly specified with the -T каталог ) must be (as far as I understand) at least as much free space (for temporary files ), how much is the original data.
Source: https://ru.stackoverflow.com/questions/968026/
All Articles
$ sort -u файлы > результат(files, of course, must be text). - aleksandr barakin 5:03 pmcat dict/*.* | sort | uniq > output.txtcat dict/*.* | sort | uniq > output.txtcat dict/*.* | sort | uniq > output.txtbut can it handle such a large amount? In priority, the speed of execution - Andrewsorthas the-uoption - Andrew