I intend to generate random number the following step:
- Read the data from file
(<DATA>) - Generate random numbers as many as the input data lines
- The random number should not be generated twice,
e.g. if the rand number generated in loop no x, has been created
before then, recreate the random number.
Here is the code I have which leads to infinite loop.
What’s wrong with my logic, and how can I fix it?
#!/usr/bin/perl -w
use strict;
my %chrsize = ('chr1' =>249250621);
# For example case I have created the
# repository where a value has been inserted.
my %done =("chr1 182881372" => 1);
while ( <DATA> ) {
chomp;
next if (/^\#/);
my ($chr,$pos) = split(/\s+/,$_);
# this number has been generated before
# with this: int(rand($chrsize{$chr}));
# hence have to create other than this one
my $newst =182881372;
my $newpos = $chr ."\t".$newst;
# recreate random number
for (0...10){
if ( $done{$newpos} ) {
# INFINITE LOOP
$newst = int(rand($chrsize{$chr}));
redo;
}
}
$done{$newpos}=1;
print "$newpos\n";
}
__DATA__
# In reality there are 20M of such lines
# name positions
chr1 157705682
chr1 19492676
chr1 169660680
chr1 226586538
chr1 182881372
chr1 11246753
chr1 69961084
chr1 180227256
chr1 141449512
You had a couple of errors:
$newstwithin your loop every time, so$newposnever took on a new value.forloop didn’t make sense, because you never actually changed$newposbefore checking the condition again.redo;was working on the inner loop.Here is a corrected version that avoids
redoaltogether.Update: I edited the algorithm a bit to make it simpler.
Update 2: while the above algorithm will work, it will get really slow on 20,000,000 lines. Here is an alternative approach that should be faster (There is sort of a pattern to the random numbers it generates, but it would probably ok for most situations).