The input list can be more than 1 million numbers. When I run the following code with smaller ‘repeats’, its fine;
def sample(x):
length = 1000000
new_array = random.sample((list(x)),length)
return (new_array)
def repeat_sample(x):
i = 0
repeats = 100
list_of_samples = []
for i in range(repeats):
list_of_samples.append(sample(x))
return(list_of_samples)
repeat_sample(large_array)
However, using high repeats such as the 100 above, results in MemoryError. Traceback is as follows;
Traceback (most recent call last):
File "C:\Python31\rnd.py", line 221, in <module>
STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY)
File "C:\Python31\rnd.py", line 129, in repeat_sample
list_of_samples.append(sample(x))
File "C:\Python31\rnd.py", line 121, in sample
new_array = random.sample((list(x)),length)
File "C:\Python31\lib\random.py", line 309, in sample
result = [None] * k
MemoryError
I am assuming I’m running out of memory. I do not know how to get around this problem.
Thank you for your time!
Expanding on my comment:
Let’s say the processing you do to each sample is calculate its mean.
This is going to make you sweat holding all those lists in memory. You can get it much lighter like this:
But that’s still not good enough… You can do it all with only ever constructing your list of results:
Now, can your algorithm be streamlined like this?