I hope I’m able to explain this problem clearly. I’m a python experimenter (just in case the below query appears naive)
Assume that I have a dataset of the form:
a = ( ('309','308','308'), ('309','308','307'), ('308', '309','306', '304'))
Let me call each ('309','308','308') as a path.
I want to find the count of:
a. Count('309','308', <any word>)
b. Count('309',<any word>,'308')
and all possible permutations.
I’m thinking its some kind of a regex which will help me achieve this search. And, the number of paths I have goes onto 50000.
Can anyone suggest how I can do this kind of an operation in python? I explored trie, radix but I dont think that’ll help me.
Thanks,
Sagar
You could use
collections.Counterto do this:I’m also using extended tuple unpacking here, which didn’t exist pre-Python 3.x, which is only needed if you have tuples of an uncertain length. In python 2.x, you could instead do:
I couldn’t say how efficient this would be, however. I don’t believe it should be bad.
A
Counterhas adict-like syntax:Edit: You mentioned they might be of any length greater than one, in this case, you could run into problems as they won’t be able to unpack if they are shorter than the required length. The solution is to change the generator expression to ignore any not in the required format:
E.g:
If you need to have a variable length count, the easiest way is to use a list slice:
Of course, this only works for continuous runs, if you want to pick columns individually, you have to do a little more work: