I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:
import random
random.sample(mySet, 1)
But this is quite slow for large sets (it runs in O(n) time).
Other solutions aren’t random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):
for e in mySet:
break
# e is now an element from mySet
I coded my own rudimentary class which has constant time lookup, deletion, and random values.
class randomSet:
def __init__(self):
self.dict = {}
self.list = []
def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)
def addIterable(self, item):
for a in item:
self.add(a)
def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]
def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]
def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue
Are there any better implementations for this, or any big improvements to be made to this code?
I think the best way to do this would be to use the
MutableSetabstract base class incollections. Inherit fromMutableSet, and then defineadd,discard,__len__,__iter__, and__contains__; also rewrite__init__to optionally accept a sequence, just like thesetconstructor does.MutableSetprovides built-in definitions of all othersetmethods based on those methods. That way you get the fullsetinterface cheaply. (And if you do this,addIterableis defined for you, under the nameextend.)discardin the standardsetinterface appears to be what you have calleddeletehere. So renamedeletetodiscard. Also, instead of having a separatepopRandommethod, you could just definepopRandomlike so:That way you don’t have to maintain two separate item removal methods.
Finally, in your item removal method (
deletenow,discardaccording to the standard set interface), you don’t need an if statement. Instead of testing whetherindex == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whetherindex == len(self.list) - 1or not: