I have some tagged data that I am processing using lxml. When I open a file I do not know before the file is opened if I have one or more of three types of elements (I could have one, two or three different elements and multiple instances of any type I have)
I need some information about these elements that is contained in child tags of the element
<element_type_1>
<name>joe smith</name>
</element_type_1>
<element_type_2>
<name>mary smith</name>
</element_type_2>
<element_type_3>
<name>patrick smith</name>
</element_type_3>
So in this case I have all three types but only one of each type however there could be up to some arbitrary large number of any type.
I am getting the elements by using cssselect 3 times in my function
def get_types(myTree):
type_dict=defaultdict(list)
type_dict['type_1']=myTree.cssselect('element_type_1')
type_dict['type_2']=myTree.cssselect('element_type_2')
type_dict['type_3']=myTree.cssselect('element_type_3')
ret type_dict
This seems overly redundant
Am I missing something that would clean up this a bit?
FYI I am doing this because for each type I have to match some other data from a related document
The early answers suggest I need to clarify a bit – I want to avoid running through the tree three times
You could do this: