I have a data structure like this:
items = [
['Schools', '', '', '32'],
['Schools', 'Primary schools', '', '16'],
['Schools', 'Secondary schools', '', '16'],
['Schools', 'Secondary schools', 'Special ed', '8'],
['Schools', 'Secondary schools', 'Non-special ed', '8'],
]
It’s a list of spending items. Some are aggregates, e.g. items[0] is aggregate spending on all schools, and items[2] is aggregate spending on secondary schools. Those that are not aggregates are items[1], items[3] and items[4].
How can I elegantly reduce the list so it only shows non-aggregate items? In pseudocode:
for each item in items
check if item[1] is blank, if it is
check if item[0] matches another item’s[0]
if it does and if that item’s[1] isn’t blank
delete item
check if item[2] is blank, if it is
check if item[1] matches another item’s[1]
if it does and if if that item’s[2] isn’t blank
delete item
Here’s my (lame!) attempt so far:
for i in range(len(items)):
i -= 1
if items[i]:
if items[i][1] == "":
for other_item in items:
if items[i][0]==other_item[0] and other_item[1]!="":
items_to_remove.append(i)
continue
elif items[i][2]=="":
for other_item in items:
if items[i][1] == other_item[1] and other_item[2] != "":
items_to_remove.append(i)
continue
new_items = [ key for key,_ in groupby(items_to_remove)]
new_items.sort(reverse=True)
for number in new_items:
temp_item = items[number]
items.remove(temp_item)
This is just so ugly. What can I do better?
NB: I could use dictionaries instead of lists, if that would make life easier 🙂
Here I find the “key” of each item, which is all entries in an item, concatenated, except the last value.
An aggregate item’s key is always a prefix of succeeding items’ keys, so we can use this test to detect aggregate items and dismiss them.
This alg. prints (on your input):
Note:
This assumes all items are ordered neatly in a tree structure (as your original data). If it’s not, it’ll be (slightly) more complicated as you’ll have to sort the keys before the loop (and keep track of which key belongs to which item).