How to parse duplicate data into an iterable structure (list, dictionary)?

Question

From the server I receive answers of a similar type (randomly):

ubs:89564: current: oil_change:"" client:2629: dist:<empty> client:2629: spec modes:DW client:2629: timer:2017.01.04 17:38:46 client:2629: pack:391, gov_num:213042, start:2016.12.02 02:05:25, end:2016.12.09 02:05:25 client:2629: pack:392, gov_num:213043, start:2016.12.02 02:05:25, end:2016.12.12 02:05:25 client:2629: pack:230, gov_num:211624, start:2016.11.22 14:30:37, end:2999.12.31 00:00:01 subs:7090: stnd:1, sbst:active, rtst:0

or

 client:7090: queue:118,117 client:7090: discount:LU,MAX client:7090: charge period:2017.01.05 14:42:58/2017.01.06 14:42:58, qad:155 client:7090: charge period:2017.01.05 14:42:16/2017.01.06 14:42:16, pack:61:214166 client:7090: timer:2017.02.04 14:42:22 client:7090: rtpl:30, rtpl join:2017.01.05 14:42:22, zone:2, doc:"250026529", start:2016.01.05 14:42:22, end:2999.12.31 00:00:01, current

The question is, I need to determine that the answer contains a line of this format "client:2629: pack:392, gov_num:213043, start:2016.12.02 02:05:25, end:2016.12.12 02:05:25" . Those. if the answer came in 2 formats, then we do not parse it at all. And then I need to throw in the list the values of pack, gov_num, start, end. I at the moment seem to have made up a regular type like "?<=pack:)-?\d+\.?\d*" , etc. to receive this data separately, but here’s how to get a new sheet for each new customer. those. the result, in the case of getting the first answer, which I want to achieve something like this (you can in the sheet, you can in the dictionary, it does not matter):

 [(391, 213042, 2016.12.02 02:05:25, 2016.12.09 02:05:25), (392, 213043, 2016.12.02 02:05:25, 2016.12.12 02:05:25), (230, 211624, 2016.11.22 14:30:37, 2999.12.31 00:00:01)]

MaxU MaxU 52.4k 6 18 51 · Accepted Answer · 2017-01-05T12:49:03

For example:

 import re In [182]: print(s) ubs:89564: current: oil_change:"" client:2629: dist:<empty> client:2629: spec modes:DW client:2629: timer:2017.01.04 17:38:46 client:2629: pack:391, gov_num:213042, start:2016.12.02 02:05:25, end:2016.12.09 02:05:25 client:2629: pack:392, gov_num:213043, start:2016.12.02 02:05:25, end:2016.12.12 02:05:25 client:2629: pack:230, gov_num:211624, start:2016.11.22 14:30:37, end:2999.12.31 00:00:01 subs:7090: stnd:1, sbst:active, rtst:0 In [183]: re.findall(r'\s+pack:(\d+),\s*gov_num:(\d+),\s*start:([^\,\n\r]*),\s*end:([^\,\n\r]*)', s) Out[183]: [('391', '213042', '2016.12.02 02:05:25', '2016.12.09 02:05:25'), ('392', '213043', '2016.12.02 02:05:25', '2016.12.12 02:05:25'), ('230', '211624', '2016.11.22 14:30:37', '2999.12.31 00:00:01')] In [184]: print(s2) client:7090: queue:118,117 client:7090: discount:LU,MAX client:7090: charge period:2017.01.05 14:42:58/2017.01.06 14:42:58, qad:155 client:7090: charge period:2017.01.05 14:42:16/2017.01.06 14:42:16, pack:61:214166 client:7090: timer:2017.02.04 14:42:22 client:7090: rtpl:30, rtpl join:2017.01.05 14:42:22, zone:2, doc:"250026529", start:2016.01.05 14:42:22, end:2999.12.31 00:00:01, current In [185]: re.findall(r'\s+pack:(\d+),\s*gov_num:(\d+),\s*start:([^\,\n\r]*),\s*end:([^\,\n\r]*)', s2) Out[185]: []

How to parse duplicate data into an iterable structure (list, dictionary)?

1 answer 1

More articles: