I’m trying to find a reasonably efficient way of getting any single integer not in a set. So for example, if I have numbers={1, 4, 9}, then a valid result would be 3. I can do it like this:
n = random.randint(-100, 100)
if n not in numbers:
return n
But I don’t want to be constrained to an arbitrary range (e.g. -100 -> 100) since I have no idea how large the set will be. The other option would be to iterate over every integer, but that would be terribly inefficient.
Does anyone have any suggestions about a better way of doing this?
Edit: Because of the number of questions about exactly what I’m trying to do, I’m updating this question to explain some of the background.
What I am actually trying to achieve is a mapping something like this: {a: 1, b: 2, c: 1} where a, b and c are object instances. The values in this unique for a group, so I can tell that a and c are in group 1 and b is in group 2. The actual number is irrelevant, it’s just a unique key for the group and doesn’t relate to anything outside this structure. The actual structure is a database table whith two fields, both of which are indexed so that I can quickly find out, for example, what else is in the same group as a.
Now what I need the unique number for, is when I want to add a group. This doesn’t happen very often so it doesn’t have to be incredibly efficient, but since the amount of data can get quite large I need to keep the number of iterations down. I realise that there are a few simple ways of doing this to with acceptable limitations, e.g. using randint with a large range (e.g. 1e6), or possibly even using a database function. But since I’ve been thinking about this it’s become a matter of interest to find a neat solution for populating the values without hardcoded limits. Obviously memory limitations (e.g. the max size of an integer) still apply.
But there’s an infinite number of integer values, so there’s an infinite number of integer values not in a finite set. Unless you’re talking about integer within a finite range (e.g. 16 bits).
The most efficient solution will depend on the how complete the set of integers is – if it’s sparse then picking a number at random is likely to return one not in the set more often. If there are few gaps, then an optimised search will be more efficient. Both of these depend on having a sorted list of the set.
Looking at the search method, it’s possible to speed up the search by partitioning the data: calculate the mean M of the lowest and highest index of numbers in the list, if dataset[M]<(M-dataset[0]) then there’s a gap, otherwise check if dataset[last]<(dataset[0]+last) in which case there’s a gap in the second half, repeat the process for the half of the data which has a gap.