Hello!

I have a certain dictionary, where there are many more elements than in the example:

a = { 'hosts': [{u'available': u'0', u'disable_until': u'0', u'error': u'', u'errors_from': u'0', u'groups': [{u'groupid': u'361'}], u'host': u'my.hostname.local', u'hostid': u'144844', u'interfaces': {u'302972': {u'dns': u'my.hostname.local', u'hostid': u'144844', u'interfaceid': u'302972', u'ip': u'1.1.1.1', u'main': u'1', u'port': u'161', u'type': u'2', u'useip': u'1'}}, u'ipmi_authtype': u'-1', u'ipmi_available': u'0', u'ipmi_disable_until': u'0', u'ipmi_error': u'', u'ipmi_errors_from': u'0', u'ipmi_password': u'', u'ipmi_privilege': u'2', u'ipmi_username': u'', u'jmx_available': u'0', u'jmx_disable_until': u'0', u'jmx_error': u'', u'jmx_errors_from': u'0', u'lastaccess': u'0', u'maintenance_from': u'0', u'maintenance_status': u'0', u'maintenance_type': u'0', u'maintenanceid': u'0', u'maintenances': [], u'proxy_hostid': u'11223', u'snmp_available': u'2', u'snmp_disable_until': u'0', u'snmp_errors_from': u'0', u'status': u'0'}] } 

In addition to everything, as you probably noticed, he is in Unicode. Elements at u'interfaces' can have one or hundreds. So questions:

  1. How for each element to make .encode("utf-8") to get rid of u' and in the future it is more convenient to work with this dictionary?

  2. Somehow it is possible in python to produce a convenient dictionary search without the millions of nested for key, value in ... ? Specifically from all this disgrace I need to get b = {"my.hostname.local" : {"ip" : "1.1.1.1"}} where my.hostname.local corresponds to the u'host' field, and ip needs to be taken from that u'interfaces' , in which there is a u'type': u'2' . I understand how to do this with 100,500 for loops, but this is terrible and incredibly inconvenient. Is there any better and faster way?

  • How do you imagine the search process? What do you enter and what do you expect to receive? sample input and output please. Especially for this: and ip you need to take from that u'interfaces', in which there is a u'type ': u'2' Do you need a general search method or just to get some specific data? - Max
  • @Max, hello. First, I need to get rid of all this from Unicode, because even with for I can not solve the problem as I thought before. The search process is trivial: a.find('ip') function that would output all ip keys with their value and, possibly, with the “path” by which they can be found. It's elementary why nobody has done this yet? The fact that in the end I need to get I wrote above, but I dare to believe that in the future I will need to look for something else in this dictionary. - user231551
  • @ user231551, can you lay out a more complex example (can be in the form of a link to a JSON file or in the form of Python code as you have in question)? host - always one or there may be many? Are interfaces always listed as a dictionary or can they be like a list? - MaxU
  • @MaxU hosts always one. It always contains one list consisting of a set of dictionaries, in each of which there can be only one key of interfaces but there can be a set of int keys whose values ​​can differ. Sorry if it's too hard to explain. - user231551
  • @ user231551, is there only one element (host) in this single list (list) or can there be several? - MaxU

2 answers 2

Solution using the Pandas module:

 import pandas as pd def get_ip(dct): if isinstance(dct, dict): if 'type' in dct and 'ip' in dct and 'dns' in dct and dct['type'] == '2': return {dct['dns']: {'ip':dct['ip']}} return pd.np.nan In [113]: (pd.DataFrame(a['hosts']) ...: .interfaces ...: .apply(pd.Series) ...: .applymap(get_ip) ...: .apply(lambda x: x[x.first_valid_index()], axis=1) ...: .tolist() ...: ) ...: Out[113]: [{'my.hostname.local': {'ip': '1.1.1.1'}}, {'my01.hostname.local': {'ip': '1.1.1.1'}}, {'my13.hostname.local': {'ip': '1.1.1.1'}}] 

PS I used dns from interfaces as the hostname

Setup:

 a = { 'hosts': [{u'available': u'0', u'disable_until': u'0', u'error': u'', u'errors_from': u'0', u'groups': [{u'groupid': u'361'}], u'host': u'my.hostname.local', u'hostid': u'144844', u'interfaces': {u'302972': {u'dns': u'my.hostname.local', u'hostid': u'144844', u'interfaceid': u'302972', u'ip': u'1.1.1.1', u'main': u'1', u'port': u'161', u'type': u'2', u'useip': u'1'}}, u'ipmi_authtype': u'-1', u'ipmi_available': u'0', u'ipmi_disable_until': u'0', u'ipmi_error': u'', u'ipmi_errors_from': u'0', u'ipmi_password': u'', u'ipmi_privilege': u'2', u'ipmi_username': u'', u'jmx_available': u'0', u'jmx_disable_until': u'0', u'jmx_error': u'', u'jmx_errors_from': u'0', u'lastaccess': u'0', u'maintenance_from': u'0', u'maintenance_status': u'0', u'maintenance_type': u'0', u'maintenanceid': u'0', u'maintenances': [], u'proxy_hostid': u'11223', u'snmp_available': u'2', u'snmp_disable_until': u'0', u'snmp_errors_from': u'0', u'status': u'0'}, {u'available': u'0', u'disable_until': u'0', u'error': u'', u'errors_from': u'0', u'groups': [{u'groupid': u'361'}], u'host': u'my2.hostname.local', u'hostid': u'144844', u'interfaces': {u'302973': {u'dns': u'my01.hostname.local', u'hostid': u'144844', u'interfaceid': u'302973', u'ip': u'1.1.1.1', u'main': u'1', u'port': u'161', u'type': u'2', u'useip': u'1'}, u'321123': {u'dns': u'my02.hostname.local', u'hostid': u'321', u'interfaceid': u'321312', u'ip': u'2.2.2.2', u'type': u'3', u'useip': u'1'}, u'675675': {u'dns': u'my03.hostname.local', u'hostid': u'566454', u'interfaceid': u'8987798', u'ip': u'3.3.3.3', u'type': u'1', u'useip': u'1'}}, u'ipmi_authtype': u'-1', u'ipmi_available': u'0', u'ipmi_disable_until': u'0', u'ipmi_error': u'', u'ipmi_errors_from': u'0', u'ipmi_password': u'', u'ipmi_privilege': u'2', u'ipmi_username': u'', u'jmx_available': u'0', u'jmx_disable_until': u'0', u'jmx_error': u'', u'jmx_errors_from': u'0', u'lastaccess': u'0', u'maintenance_from': u'0', u'maintenance_status': u'0', u'maintenance_type': u'0', u'maintenanceid': u'0', u'maintenances': [], u'proxy_hostid': u'11223', u'snmp_available': u'2', u'snmp_disable_until': u'0', u'snmp_errors_from': u'0', u'status': u'0'}, {u'available': u'0', u'disable_until': u'0', u'error': u'', u'errors_from': u'0', u'groups': [{u'groupid': u'361'}], u'host': u'my3.hostname.local', u'hostid': u'144844', u'interfaces': {u'402972': {u'dns': u'my13.hostname.local', u'hostid': u'144844', u'interfaceid': u'402972', u'ip': u'1.1.1.1', u'main': u'1', u'port': u'161', u'type': u'2', u'useip': u'1'}, u'421123': {u'dns': u'my12.hostname.local', u'hostid': u'321', u'interfaceid': u'421312', u'ip': u'2.2.2.2', u'type': u'3', u'useip': u'1'}, u'475675': {u'dns': u'my13.hostname.local', u'hostid': u'566454', u'interfaceid': u'8987798', u'ip': u'3.3.3.3', u'type': u'1', u'useip': u'1'}}, u'ipmi_authtype': u'-1', u'ipmi_available': u'0', u'ipmi_disable_until': u'0', u'ipmi_error': u'', u'ipmi_errors_from': u'0', u'ipmi_password': u'', u'ipmi_privilege': u'2', u'ipmi_username': u'', u'jmx_available': u'0', u'jmx_disable_until': u'0', u'jmx_error': u'', u'jmx_errors_from': u'0', u'lastaccess': u'0', u'maintenance_from': u'0', u'maintenance_status': u'0', u'maintenance_type': u'0', u'maintenanceid': u'0', u'maintenances': [], u'proxy_hostid': u'11223', u'snmp_available': u'2', u'snmp_disable_until': u'0', u'snmp_errors_from': u'0', u'status': u'0'}] } 
  • Hmm, good package, did not know about this ( pandas ). And if instead of dns substitute host , how will? - user231551
  • And, however, do not bother, and so will go. See what pandas can do at leisure and redo it as needed. Thank! - user231551
  • @ user231551, the solution would be an order of magnitude simpler if interfaces were in the form of a list of dictionaries, for example: pastebin.com/hZRDYy3Q - MaxU

For clarity, here is an example of how to get the required data using for-cycles:

 >>> [{interface['dns'].encode(): {'ip': interface['ip'].encode()}} ... for host in a['hosts'] ... for interface in host['interfaces'].itervalues() ... if interface['type'] == '2'] [{'my.hostname.local': {'ip': '1.1.1.1'}}, {'my01.hostname.local': {'ip': '1.1.1.1'}}, {'my13.hostname.local': {'ip': '1.1.1.1'}}] 

The code assumes that inteface['dns'] does not contain internationalized domain names (the result is always in ascii ( sys.getdefaultencoding() ) we represent).

u'' is the textual representation of the unicode object in Python (as it looks in the source code). Text data should not be turned into bytes (what .encode('utf-8') does) if this is not necessary (for example, some API only accepts bytes). If you use bytes to represent the text, it is possible to mix the text presented in different encodings and get garbage on the output without noticing it. To exclude a similar source of errors, in Python 3 str similar to the unicode type from Python 2.

The code in the example above works because in Python 2, a str object containing data in the ascii range ( sys.getdefaultencoding() ) can work with unicode when needed:

 >>> u'1' == '1' True >>> u'1' in {u'1': 'a'} True >>> '1' in {u'1': 'a'} True >>> {u'1': 'a'}[u'1'] 'a' >>> {u'1': 'a'}['1'] 'a' 
  • Yeah, I did just that, but my version is bigger than yours: pastebin.com/h7A2Td7E . It would be my great pleasure to use python3 so as not to embarrass the stupid u' , but alas there are none. opportunities. Thanks for your option! PS You have a lot of reputation, announce a contest for the shortest code entry :) - user231551
  • @ user231551: 1- your code, as well as specific problems with it, it is better to quote in the question itself, in order to determine exactly what you want to find and what the current approach does not suit (if you are satisfied, you can publish it as your own answer ). 2- The code in my answer is the most straightforward (in the forehead) and readable (as far as I could). Brevity is a bad criterion (you can create a new question with the label competition , but it will be a different question, other code evaluation criteria). Look in the label description for rules for such questions. - jfs
  • Um, at the expense of the competition - that was humor, you know, “haha :)”, etc. I completed my “answer” almost simultaneously with @MaxU and therefore you did not publish it. In general, I wrote a topic that, in principle, I know how to solve it, but I don’t really like it. If I am not mistaken, pep8 in the cases with three-story nested cycles just adheres to the policy of "brevity". But thanks anyway. - user231551