The range for x and y is from 0 to 99.
I am currently doing it like this:
excludeFromTrainingSet = []
while len(excludeFromTrainingSet) < 4000:
tempX = random.randint(0, 99)
tempY = random.randint(0, 99)
if [tempX, tempY] not in excludeFromTrainingSet:
excludeFromTrainingSet.append([tempX, tempY])
But it takes ages and I really need to speed this up.
Any ideas?
Vincent Savard has an answer that’s almost twice as fast as the first solution offered here.
Here’s my take on it. It requires tuples instead of lists for hashability:
Just make sure that the limit is sane as other answerers have pointed out. For sane input, this is better algorithmically O(n) as opposed to O(n^2) because of the set instead of list. Also, python is much more efficient about loading locals than globals so always put this stuff in a function.
EDIT: Actually, I’m not sure that they’re O(n) and O(n^2) respectively because of the probabilistic component but the estimations are correct if n is taken as the number of unique elements that they see. They’ll both be slower as they approach the total number of available spaces. If you want an amount of points which approaches the total number available, then you might be better off using:
This will be a memory hog but is sure to be faster as the density of points increases because it avoids looking at the same point more than once. Another option worth profiling (probably better than last one) would be