My data structures:
phase1_hits = {
'1.2.3.4': {'hits': 3, 'internal': '10.28.30.153', 'public_additional': ['8.2.17.14'], 'list': 'Red', 'internal_additional': ['10.17.100.74', '10.19.70.77', '10.28.30.153']},
'2.3.4.5': {'hits': 19, 'internal': '10.19.40.175', 'public_additional': ['1.2.227.49'], 'list': 'Red', 'internal_additional': ['10.19.40.175']},
'12.23.34.45': {'hits': 52, 'internal': '192.168.164.32', 'public_additional': ['8.2.17.14'], 'list': 'Orange', 'internal_additional': ['192.168.164.32', '192.168.164.42', '192.168.164.49']},
'8.8.8.8': {'hits': 5, 'internal': '192.168.1.10', 'public_additional': ['8.8.87.153', '1.2.3.4'], 'list': 'Green', 'internal_additional': ['192.168.168.250']}
}
phase2_hits = {
97536: {'ip.dst': ['8.2.17.14'], 'ip.src': ['10.28.30.153']},
60096: {'ip.dst': ['8.2.17.14'], 'ip.src': ['192.168.164.42']},
43140: {'ip.dst': ['8.2.17.9'], 'ip.src': ['10.153.134.201']},
43789: {'ip.dst': ['10.28.30.153'], 'ip.src': ['8.2.17.9']},
89415: {'ip.dst': ['8.2.17.14'], 'ip.src': ['10.153.134.200']}
}
Facts about the data structure (maybe none of this matters??):
- phase1_hits key will always be a public IP
- phase1_hits public_additional and internal_additional could be empty
- phase2_hits private IPs could appear in ip.src or .dst
- phase2_hits ip.src and .dst will always only have a single item in the list (yes, I know it’s a silly structure but I have no control over it)
- because an private IP appears in phase1_hits does not mean it will appear in phase2_hits, so if it doesn’t I only need the phase1_hits info on it
If a phase1_hits internal or internal_additional IP is seen in phase2_hits I want to extract the corresponding:
- phase1_hits key
- phase1_hits internal (or internal_additional – whichever was seen in phase2_hits)
- phase1_hits hits
- phase1_hits list
- phase2_hits key
- phase2_hits ip.src or .dst (whichever is opposite of the internal or internal_additional address being queried)
The key concept of the extraction is to match up which private IP(s) talked to which public IP(s). Also, if it would help I can restructure phase1_hits and use a different key.
Here’s the explanation:
You need to store the data somewhere where you can change/use it later. A list is the easiest, since it can easily change size according to the data you have and the number of elements that change. Though a dictionary is doable, it wouldn’t be very efficient, especially since you are looking for single matchups and aren’t trying to create pointers to specific data.
You can traverse the keys in a dictionary by treating it much like a list (i.e., calling the
.keys()function is not necessary since you are not changing/deleting the keys in the dictionary).I aliased the subdictionary for readability; it is, however, not necessary. On the following line, I took advantage of Python’s native list
__add__, which simple appends the elements of one list to the other and creates a new list, becauseinternal_ipscan sometimes have multiple elements. Then, because you want the number ofhitsand the'list'value in the subdict, I createdcolor_list. Note I did not name it the same as the key, because doing so would conflict with Python’s native namespace for thelistvariable type.The only new thing here is
overlap. By using a list generator, we can find all the overlapping values (hence the name). You can call it what you want; just know that it will be populated with all the common values between the two. (You can use it too: formula is basically[i for i in L1 if i in L2], whereL1andL2are both lists.)The
ifstatement ensures that there is at least one overlap betweenphase1_hitsinternal or internal_additional IP andphase2_hits. If so, it will populate a tuple (which is immutable) based on this info. (I chose a tuple since it’s immutable and you know its structure, but you can change it into a list if you want.) Once populated, it is then appended to thematcheslist.Once done going through both loops you should have what you want.