there is a file with strings, among which doubles come across.

How to use gnu / coreutils utilities from the gnu operating system (compatibility with the posix standard is not required) to remove duplicates without disturbing the string order?

    2 answers 2

    for example, you can use this construction:

    $ nl исходный-файл | sort -k 2 -u | sort -n | cut -f 2- > отсортированный-файл 

    explanation:

    • nl - will produce stdin strings from stdin (or read from files given by arguments), adding consecutive numbers at the beginning of strings; the number and the rest of the default contents will be separated by a tab character
    • sort -k 2 -u - sorts the transferred list by the second and subsequent ( -k 2 ) fields (the field separates the tab character by default) and deletes duplicates ( -u ) in the same second field (ignoring the first); just "delete duplicates" without sorting the sort program "does not know how"
    • sort -n - sorts the list into a numeric ( -n ) sequence; since there are numbers at the beginning of the lines, you will get a list of lines sorted in the same order as “at the very beginning” (only with gaps)
    • cut -f 2- - will leave only the fields from the second “on” ( -f 2- ); the default fields are, as usual, separated by tabs.

    additional reading:

     $ info coreutils 

    If the info program is not installed, you can read the online documentation or individual man pages (but there is usually less information):

     $ man nl $ man sort $ man cut 

      Awk option

       $ awk '!a[$0]++' исходный-файл 

      Here, each line of the file becomes the key for the associative array a . If the string is encountered the first time, then the array does not yet contain such an element, and the negation ! gives true for such a string, i.e. she goes to the exit. If the string is encountered again, then we already have a non-zero element with a string key in the array (note the post-increment), respectively, the expression is evaluated as false and the string will be ignored.

      • a good option. I think the absence of the awk program in gnu / coreutils can be neglected. - aleksandr barakin
      • @alexanderbarakin really doesn’t, why it seemed that awk was from coreutils. - Vladimir Gamalyan
      • one
        in the gnu operating system there is, of course, its own awk implementation, but this is a separate package, with coreutils not directly connected. - aleksandr barakin