The task is to find the number of lines in each file in the current directory, order the result and write to the file.

The problem is that I do not know how to identify all the text files in the folder.

  • Count the spaces, carriage returns, tabs in each file. If more than a certain percentage of the size, the file is text. Heuristics of course, but perhaps fit. Large files can be ignored. wc and grep to help. - igumnov

1 answer 1

I would suggest such a perversion:

find . -maxdepth 1 -type f \ -exec sh -c "file -bi '{}' | grep -q ^text/ && echo '{}'" \; \ | xargs wc -l | head -n -1 | sort -gk 1 > line_counts.txt 

What is going on here:

  • find . -maxdepth 1 -type f find . -maxdepth 1 -type f - we search all files in the current directory (without breaking deeper) ( -type f )
  • -exec sh -c "…" \; - for each of them execute the command sh -c "…" (instead of " {} " the file name will be substituted). The meaning of this action is that we will not simply thrust a pipe into find, it will not understand, so we have to call the shell.
  • file -bi '{}' - define the MIME type of the file ( -i ), the file name itself is not displayed ( -q ). This may not be the most accurate definition, see notes below.
  • grep -q ^text/ - choose lines that start with " text/ ", but do not print anything ( -q ), but just tell the exit code whether something is found or not
  • && echo '{}' - if found, the right part of && will be executed and the file name will be displayed.
  • xargs wc -l - all incoming file names will be supplied with wc arguments, which will count the lines ( -l ).
  • head -n -1 - cut the last line, with the total result
  • sort -gk 1 - numerically ( -g ) sorted by the first field ( -k 1 )

Variations are possible. In particular, I think that not only ^text/ all limited (some files representing text in UTF-8 have MIME types in application/* , but there are also any application/octet-stream , which, generally speaking, never text), so maybe something better is in the spirit of file -b '{}' | grep -Fq ' text' file -b '{}' | grep -Fq ' text' . Also, if there are a lot of files, and they have long names, you will have to call wc more than once, but once for each file, “ xargs -I '{}' wc -l '{}' ”.

Yes, I used mostly GNU's utilities (GNU findutils, GNU coreutils, GNU grep), except for the BSD's file . There may be other utilities on non-GNU systems that may not have any options, or they may not. In general, YMMV, if that - see the documentation.

All this, however, will break if any fan of the strange creates a file with the name containing the newline character ( \n ). Then the pipe, starting with xargs will break. To solve, you have to add && echo -e '\x00' (or something like that) and add the xargs argument to -0 ( --null ).

  • one
    plus for the perversion), too, was once so entertained))) - thunder
  • one
    Unix is so apparently driving - skegg
  • one
    The rewrite is a little easier, maybe not very reliable, but it is written in the entry wc -l file * | grep text | awk '{ print $1 }' | tr -d : file * | grep text | awk '{ print $1 }' | tr -d : file * | grep text | awk '{ print $1 }' | tr -d : | sort -gk 1 - avp
  • @avp: Yes, but it will break on the first file, with a name containing spaces and colons. I turned out to be just in the directory where I tested it. - drdaeman
  • And what does not work? Well, there (except for colons, spaces, etc.), the last line (total) must also be deleted. I wrote, it's just full of stuff. If the real task, then you need to think, the main thing is to decide what to consider as a text file. If there are any names at all, then I would not suffer, but I wrote C (if the file utility correctly defines the file type, then it would call it via popen). - avp