I have to find all the positions of matching strings within a larger string using a while loop, and as a second method using a foreach loop. I have figured out the while loop method, but I am stuck on a foreach method. Here is the ‘while’ method:
….
my $sequence =
'AACAAATTGAAACAATAAACAGAAACAAAAATGGATGCGATCAAGAAAAAGATGC'.
'AGGCGATGAAAATCGAGAAGGATAACGCTCTCGATCGAGCCGATGCCGCGGAAGA'.
'AAAAGTACGTCAAATGACGGAAAAGTTGGAACGAATCGAGGAAGAACTACGTGAT'.
'ACCCAGAAAAAGATGATGCNAACTGAAAATGATTTAGATAAAGCACAGGAAGATT'.
'TATCTGTTGCAAATACCAACTTGGAAGATAAGGAAAAGAAAGTTCAAGAGGCGGA'.
'GGCTGAGGTAGCANCCCTGAATCGTCGTATGACACTTCTGGAAGAGGAATTGGAA'.
'CGAGCTGAGGAACGTTTGAAGATTGCAACGGATAAATTGGAAGAAGCAACACATA'.
'CAGCTGATGAATCTGAACGTGTTCGCNAGGTTATGGAAA';
my $string = <STDIN>;
chomp $string;
while ($sequence =~ /$string/gi )
{
printf "Sequence found at position: %d\n", pos($sequence)- length($string);
}
Here is my foreach method:
foreach ($sequence =~ /$string/gi )
printf "Sequence found at position: %d\n", pos($sequence) - length($string);
}
Could someone please give me a clue on why it doesn’t work the same way?
Thanks!
My Output if I input “aaca”:
Part 1 using a while loop
Sequence found at position: 0
Sequence found at position: 10
Sequence found at position: 17
Sequence found at position: 23
Sequence found at position: 377
Part 2 using a foreach loop
Sequence found at position: -4
Sequence found at position: -4
Sequence found at position: -4
Sequence found at position: -4
Sequence found at position: -4
Your problem here is context. In the
whileloop, the condition is in scalar context. In scalar context, the match operator ingmode will sequentially match along the string. Thus checkingposwithin the loop does what you want.In the
foreachloop, the condition is in list context. In list context, the match operator ingmode will return a list of all matches (and it will calculate all of the matches before the loop body is ever entered).foreachis then loading the matches one by one into$_for you, but you are never using the variable.posin the body of the loop is not useful as it contains the result after the matches have ended.The takeaway here is that if you want
posto work, and you are using thegmodifier, you should use thewhileloop which imposes scalar context and makes the regex iterate across the matches in the string.Sinan inspired me to write a few
foreachexamples:This one is fairly succinct using
splitin separator retention mode:A regex equivalent of the
splitsolution:But this is clearly the best solution for your problem:
The reason its the best is because the other two solutions have to process the entire global match first, before you ever see a result. For large inputs (like DNA) that could be a problem. The
Dumb::Homeworkpackage implements an array that will lazily find the next position each time theforeachiterator asks for it. It will even store the positions so you can get to them again without reprocessing. (In truth it looks one match past the requested match, this allows it to end properly in theforeach, but still much better than processing the whole list)Actually, the best solution is still to not use
foreachas it is not the correct tool for the job.