there is a file with strings, among which doubles come across.
How to use gnu / coreutils utilities from the gnu operating system (compatibility with the posix standard is not required) to remove duplicates without disturbing the string order?
for example, you can use this construction:
$ nl исходный-файл | sort -k 2 -u | sort -n | cut -f 2- > отсортированный-файл explanation:
nl - will produce stdin strings from stdin (or read from files given by arguments), adding consecutive numbers at the beginning of strings; the number and the rest of the default contents will be separated by a tab charactersort -k 2 -u - sorts the transferred list by the second and subsequent ( -k 2 ) fields (the field separates the tab character by default) and deletes duplicates ( -u ) in the same second field (ignoring the first); just "delete duplicates" without sorting the sort program "does not know how"sort -n - sorts the list into a numeric ( -n ) sequence; since there are numbers at the beginning of the lines, you will get a list of lines sorted in the same order as “at the very beginning” (only with gaps)cut -f 2- - will leave only the fields from the second “on” ( -f 2- ); the default fields are, as usual, separated by tabs.additional reading:
$ info coreutils If the info program is not installed, you can read the online documentation or individual man pages (but there is usually less information):
$ man nl $ man sort $ man cut Awk option
$ awk '!a[$0]++' исходный-файл Here, each line of the file becomes the key for the associative array a . If the string is encountered the first time, then the array does not yet contain such an element, and the negation ! gives true for such a string, i.e. she goes to the exit. If the string is encountered again, then we already have a non-zero element with a string key in the array (note the post-increment), respectively, the expression is evaluated as false and the string will be ignored.
Source: https://ru.stackoverflow.com/questions/537604/
All Articles