Which tool can combine several dictionaries into one file and then sort and clear from duplicates?
100+ txt, dic, doc - dictionaries, more than 300 GB
The tool requires UTF-8 support and does not remove spaces at the end of the line.
Which tool can combine several dictionaries into one file and then sort and clear from duplicates?
100+ txt, dic, doc - dictionaries, more than 300 GB
The tool requires UTF-8 support and does not remove spaces at the end of the line.
if these are text files, then the sort program is enough:
$ sort -u файл(ы) > результат
the -u
option - “remove duplicates” (“leave only unique strings”).
about the required resources - you can read the answers to this question: How could the UNIX sort command be a very large file?
briefly: an external sort is used (using the n-way merge method), which means that the file system where the temporary directory is ( $TMPDIR
or /tmp
or explicitly specified with the -T каталог
) must be (as far as I understand) at least as much free space (for temporary files ), how much is the original data.
Source: https://ru.stackoverflow.com/questions/968026/
All Articles
$ sort -u файлы > результат
(files, of course, must be text). - aleksandr barakin 5:03 pmcat dict/*.* | sort | uniq > output.txt
cat dict/*.* | sort | uniq > output.txt
cat dict/*.* | sort | uniq > output.txt
but can it handle such a large amount? In priority, the speed of execution - Andrewsort
has the-u
option - Andrew