Parsing statistics statistics logs

Question

There are similar entries in the statistics log -

<- 21.03.2016 (15:10:30) | 127.0.0.1 | Mozilla / 5.0 (Windows NT 6.2) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 27.0.1453.110 Safari / 537.36 | domain.ru | referer ->

I am sorting records in real time without pulling intermediate results into the database (in the future I will think about it).

Explain: How to disassemble this data in parts. For example, when parsing, get: сколько всего посетителей за сегодня\за неделю\вчера , сколько всего посетителей зашло на определенный домен(domain.ru) за сегодня\за неделю\вчера .

Somehow I solved the problem ... The chain is as follows: break apart into separate records-> find out how many people yesterday and today are just by date (and how can we choose this week?). For the domain, similar operations take place ... but this is crooked as for me

cheops cheops 18.1k 9 32 120 · Accepted Answer · 2016-03-21T20:19:05

It is best to store such information by day, just start an associative array and increase the keys for an element with a key equal to the current day.

 2016-03-21 100532 2016-03-20 93243 2016-03-22 103423 ...

If your unique hosts / IP addresses are not counted, but simply the sum of visits is counted, then the sum for the last 7 days will give the number of visits per week - let's sum up the last 7 elements. If you count hosts (unique IP addresses), then this approach will be inaccurate, since the same IP address may occur in a few days. We'll have to start a separate associative array for weeks.

To store and process this information, instead of a slow database, you can use a fast NoSQL solution, for example, Redis. A feature of most NoSQL databases is that the data is stored in RAM - i.e. Counting is performed very quickly, even as if the data were in the memory of your program. Plus, many NoSQL databases have aggregation capabilities. For example, with the help of sets, you can immediately select only unique values; by storing records into an array with a key equal to the time interval, you can then extract sums by them. The most important thing is everything in RAM and very fast.

Parsing statistics statistics logs

1 answer 1

More articles: