Python log processing

Question

There is a log of events where any garbage falls, for example:

Fri Feb 21 04:50:53 2014 Thread 1 cannot allocate new log, sequence 13184 Private strand flush not complete Current log# 3 seq# 13183 mem# 0: /opt/oracle/admin/BITS/redolog/redo03.log Thread 1 advanced to log sequence 13184 Current log# 4 seq# 13184 mem# 0: /opt/oracle/admin/BITS/redolog/redo04.log Fri Feb 21 04:51:02 2014 LNS: Standby redo logfile selected for thread 1 sequence 13184 for destination LOG_ARCHIVE_DEST_1 Fri Feb 21 05:00:53 2014 Thread 1 cannot allocate new log, sequence 13185 Private strand flush not complete Current log# 4 seq# 13184 mem# 0: /opt/ora/admin/Bнн/redolog/redo04.log Thread 1 advanced to log sequence 13185 Current log# 3 seq# 13185 mem# 0: /opt/ora/admin/Bнн/redolog/redo03.log Fri Feb 21 05:00:57 2014 LNS: Standby redo logfile selected for thread 1 sequence 13185 for destination LOG_ARCHIVE_DEST_1

It contains records that are needed and interesting, for example:

 ORA-0155 caused by SQL statement below (SQL ID: 39c4440, Query Duration=13923299 sec, SCN: 0x0001.48500a): SELECT * FROM RELATIONAL("BILL"."REQUEST") The value (30) of MAXTRANS parameter ignored. kupprdp: master process DM00 started with pid=100, OS id=11330 to execute - SYS.KUPM$MCP.MAIN('SYS_EXPORT_FULL_06', 'SYS', 'KUPC$C_1_201402210646', 'KUPC$S_1_2014021064146', 0); kupprdp: worker process DW01 started with worker id=1, pid=109, OS id=11332 to execute - SYS.KUPW$WORKER.MAIN('SYS_ERT_FULL_06', 'SYS'); ALTER SYSTEM SET undo_retention=90 SCOPE=BOTH;

Now I just filter all the garbage lines through the in operator, but the problem remains with the fact that the dates are not so filtered.

Here is an exemplary implementation of what I got:

 with open(log_filename, 'r') as log_file: for line in log_file.readlines(): if 'LNS: Standby redo logfile selected for thread' not in line: if '...' not in line: ... print(line)

Is there an example of solving a similar filtering problem along with dates?

Answer 1 · 2015-05-03T14:29:32

In your case, it is known which lines need to be dropped, but it is not known which lines should remain.

An implementation based on the "black list" will do (i.e., discarding all that we can; what is not discarded is the right one).

You have two options for checking: simple (where you can use the in operator) and complex (where regular expressions are required).

I propose to group simple and complex checks as follows:

 # В этом списке собираются подстроки для проверки через in bad_line_parts = [ 'Standby redo logfile selected for thread', ... ]

More complex checks may indeed be more convenient to implement through regular expressions. In your case, with a date, you can use a fairly simple expression with a small chance to miss something extra (and with a small chance not to miss something important):

 import re bad_reg_exprs = [ re.compile('[AZ][az]{2} [AZ][az]{2} \d{1,2} \d\d:\d\d:\d\d \d{4}'), ... # здесь дополнить другими регулярными выражениями, если потребуется ]

Filtering can now be written like this:

 def is_fine_line(line): return (not any(bad_line_part in line for bad_line_part in bad_line_parts) and not any(bad_reg_expr.match(line) for bad_reg_expr in bad_reg_exprs)) with open(log_filename, 'r') as log_file: for line in log_file.readlines(): if is_fine_line(line): print(line)

If you wish, you can extend the is_fine_line function with is_fine_line checks in the same way.

Thus, the functionality responsible for filtering lines is removed from the main function, which should increase readability.

Also, it became possible to use the new function as a filtering condition:

 filtered_lines = (line for line in log_file if is_fine_line(line))

it is better to use awk, the syntax is more concise - hardsky

igaraev igaraev 391 2 silver marks 13 bronze marks · Answer 2 · 2014-09-08T06:59:51

Need to use RegEXP

Python log processing

2 answers 2

More articles: