There are a lot of logs from the server. How and how can they be analyzed on the local host? Interested in OS, browsers, number of visitors. Thanks in advance.

Closed due to the fact that the issue is too general for participants aleksandr barakin , Kromster , user194374, rjhdby , post_zeew December 19, '16 at 13:19 .

Please correct the question so that it describes the specific problem with sufficient detail to determine the appropriate answer. Do not ask a few questions at once. See “How to ask a good question?” For clarification. If the question can be reformulated according to the rules set out in the certificate , edit it .

  • What kind of logs? - don Rumata
  • access and error from nginx ʻa - alexin

1 answer 1

You can put any system of log analysis, copy the logs yourself and set the system on the logs.

For example:

And you can, just write on a bash / pearl / or even php in a couple of lines with regulars and pull out everything you need. For example, there are nginx logs and we want to know the popularity of browsers.

egrep -o -h '"[^"]*"$' access.log* | sort | uniq -c | sort -n -k 1 

disassemble by pipe (vertical stick)

  • at first, the egrep pulls the ends of the lines — in my logs, the user agent is written at the very end in double quotes. And look in all files access.log. the -h option says not to display filenames - they will only interfere.
  • then we will sort it all out, since there will be uniq -c, which counts duplicates and displays their count. But he needs a sorted list.
  • and at the end sort the list again. The -k 1 parameter says that you need to sort by the first column (we have a number), and -n says that you need to apply sorting as for numbers (by default there is lexicographical and numbers will be sorted like that 1 11 2 instead of the expected 1 2 eleven).

But what if it is operating systems that interest? it's a bit more complicated here, because you will need to hope that browsers correctly convey it. And bots usually slip left. Only finish the regular season.

Or I want statistics on ip addresses. It's also simple here - the ips go first in the line before the space (we are still talking about the default nginx logs).

 cut -f 1 -d ' ' access.log* | sort | uniq -c | sort -k 1 -n 

Pulling the number of users - and here is really a problem. First you need to understand exactly how to count them. If you have taken care of, and output some unique information to the log (for example, a session cookie or username appears in the URL) - no problem. We select a regular season for cutting and ready.

Often logs are packed in the archive. And then they do not even need to unpack. For many utilities, there is their "analog", which begins with the letter z. That is, if cat is able to output just a text file, then zcat is able to output the file packed into the archive. And zgrep is looking into them.

Since the logs usually do not change dramatically, such systems have the right to live in their small projects. In large enterprises, it is better to use the above systems, which will parse logs in real time mode and allow you to make various requests and notifications.

  • Thanks for such a detailed answer! - alexin