Consider I am given a specific range (0 to 5,000,000) and I should generate 2,500,000 unique random numbers from this range. What is an efficient way to do this? I understand that is tough to get true random numbers.
I tried by checking if a number exists so that I can generate a new random number. But it takes hours to compute. Is there a better way to do this.
The reason behind this is, I have a vector of size 5,000,000. I want to shrink the vector exactly by half. i.e. delete random 50% of the elements from the vector.
#include <iostream>
#include <vector>
#include <stdlib.h>
#include <algorithm>
using namespace std;
#define NUMBER 2500000
#define RAND_START 0
#define RAND_END 5000000
unsigned int generate_random_number(int min, int max)
{
return min + (rand() % (unsigned int)(max - min + 1));
}
int main(int argc, char* argv[])
{
unsigned int count = 0, random_number;
vector<unsigned int> rand_vector;
do
{
count++;
random_number = generate_random_number(RAND_START,RAND_END);
// Tried to manually add a different number each time. But still not a considerable improvement in performance.
if (std::find(rand_vector.begin(), rand_vector.end(), random_number) != rand_vector.end())
{
if(random_number > count)
random_number = random_number - count;
else
random_number = random_number + count;
}
rand_vector.push_back(random_number);
sort(rand_vector.begin(), rand_vector.end());
rand_vector.erase(unique (rand_vector.begin(), rand_vector.end()), rand_vector.end());
}while (rand_vector.size() != NUMBER);
for (unsigned int i =0; i < rand_vector.size(); i++)
{
cout<<rand_vector.at(i)<<", ";
}
cout<<endl;
return 0;
}
Any better approach by which I can do this?
You seem to be locked on an idea that you have to pre-generate your random numbers somehow. Why? You said that the ultimate task is to delete some random elements from a vector. For that specific problem it is not necessary to pre-generate all random indices in advance. You can simply generate these indices “on the fly”.
For this specific task (i.e. delete 50% of elements in the vector), Knuth algorithm will work pretty well (see https://stackoverflow.com/a/1608585/187690).
Just iterate through all elements of the original vector from
0toN-1and make a random decision to deletei-th element with the probability ofN_to_delete / N_to_iterate, whereN_to_deleteis the number of elements that still have to be deleted, andN_to_iterateis the length of the remaining portion of the vector. This approach does it in one pass (if implemented smartly), requires no extra memory and no trial-and-error iterations. It simply does exactly what you want it to do: destroys 50% of vector elements with equal probability.Knuth algorithm works best in situations when the number of random values (
M) is fairly large compared to the length of the range (N), since its complexity is tied toN. In your case, whereMis 50% ofN, using Knuth algorithm is a pretty good idea.When the number of random values is much smaller than the range (
M << N), Bob Floyd algorithm (see the above link) makes more sense, since its complexity is defined byMand not byN. It requires additional memory (a set), but still makes no trial-and-error iterations when generating random numbers.However, in your case you are trying to delete elements from a vector. Vector element deletion is dominated by
N, which defeats the benefits of Bob Floyd algorithm anyway.