Given a set strings of containing numbers, how can I find those strings that are the superset. For example if strings ‘139 24’ and ‘139 277 24’ appear then I want to keep ‘139 277 24’ as ‘139 24’ can be found inside it. Also these numbers may appear in any order inside a string.
'24'
'277'
'277 24'
'139 24'
'139 277 24'
'139 277'
'139'
'136 24'
'136 277 24'
'136 277'
'136'
'136 139 24'
'136 139 277 24'
'136 139 277'
'136 139'
'246'
The result for the above data is given below.
'136 139 277 24'
'246'
Edit: I am splitting each string and putting individual numbers in a set and then comparing this through the sets created from the whole list. I can find a solution using this approach but I think there should be some other elegant way to perform the same.
I was trying the following code and felt that it is becoming unnecessarily complex.
#First create a set of tuples
allSeqsTuple = set()
for seq in allSeqs: #allSeqs store the sequences described above
x = seq.split()
allSeqsTuple.add(tuple(x))
#For each 'allSeqs', find if all the items in that seq that is in 'allSeqsTuple'.
for line in allSeqs:
x = set(line.split())
result = findContainment(x, allSeqsTuple)
......
......
def findContainment(x, allSeqsTuple):
contained = False
for y in allSeqsTuple:
cntnd = bool(x-set(y))
if (cntnd):
contained = True
continue
else:
break
return contained
Let’s make a laundry list of the major players in this problem:
'24 139 277'<=set operatorset(['24', '139', '277'])We are given a list of strings, but what we’d really like — what would be more useful — is a list of sets:
The reason for frozensets will become apparent shortly. I’ll explain why, below. The reason why we want sets at all is because that have a convenient superset comparison operator:
This is exactly what we need to determine if one string is a superstring of another.
So, basically, we want to:
superstrings = set()for s in strings.sinstrings, we will add new ones tosuperstringsif they are not a subset of a item already insuperstrings.For each
s, iterate through a set ofsuperstrings:for sup in superstrings.Check if
s <= sup— that is, ifsis a subset ofsup, quit the loop sincesis smaller than some known superstring.Check if
sup <= s— that is, ifsa superset of some item insuperstrings. In this case, remove the item insuperstringsand replace it withs.Technical notes:
Because we are removing items from
superstrings, we can not alsoiterate over
superstringsitself. So, instead, iterate over a copy:superstringsto be a set of sets. Butthe items in a set have to be hashable, and sets themselves are not
hashable. But frozensets are, so it is possible to have a set of
frozensets. This is why we converted
stringsinto a list offrozensets.yields
It turns out this is faster than pre-sorting the strings: