I’ve been struggling with this for a while and I was wondering if there was something obvious I’ve missed.
As programming learning/practice, I’m trying to put together a simple script for calculating the components of a restriction enzyme digest mix. However, first I need to get a list of enzyme stock concentrations.
I pulled all the individual pages from the New England Biolabs enzyme page, and my goal with this current script is to pull out the name of the enzyme and the concentrations available from the company.
This example works with a local copy of EcoRI (link included at bottom of submission).
use warnings;
use strict;
open(FILE,'productR0101.asp');
my $line;
my $counter;
my $array1;
my $array2;
my $array3;
my $concentration;
my @array4;
$counter = 1;
while ($line = <FILE>) {
chomp($line);
if ($counter == 6 ){
$array1 = $line;
$counter++;
}
else{
$counter++;
}
if ($line =~ m/.{8}units.ml/g) {
(@array4) =$line =~ m/.{8}units.ml/g;
print @array4;
}
}
print "\n".$array1;
exit;
Every file has the enzyme name on the sixth line of the file, so I just pulled that whole line. However, the concentrations are in different locations, so my approach was to read in the file one line at a time, and match to the units/ml tag.
My thinking was that it should print out the match for each line, if there was one, every time the while loop runs, effectively resulting in a string of separate print statements.
This is where I get messed up. There are six different locations in this file with a units/ml tag: three for 20,000 and three for 100,000.
I was expecting six different results printed, but when I run this, only one 100,000 units/ml result is returned.
I’ve tried all sorts of fixes. I tried concatenating strings, I tried storing it as a string, I tried concatenating it onto another array that never gets touched by the (@array4) = $line =~ m/.{8}units.ml/g line, and it either breaks it or gives the same result.
And finally, I apologize for any weird conventions. I’m still learning Perl, and my first experience programming was with MATLAB.
Also, the $array1, $array2, etc. exist because I was trying to keep track of exactly what was getting put where; my intention is to clean it up once I get it functional.
So does anyone have any ideas about what I’m doing wrong?
EDIT: the data source is the source code to each individual enzyme page. For this example, if you view the page source you get the complete input file I gave to the script.
We really need to see the data you are processing, but it looks like you are storing only the last occurrence of
/units.ml/in@array4because you are reading the file line by line.I will add to this answer if you supplement your question, but for now I need to know
What your data looks like
What the mysterious
/.{8}/is forAre you aware that
$array1,$array2, and$array3, are scalars, as well as being very bad names for variables?For now, here is a rewrite of your code using idiomatic Perl, and the
$.variable that evaluates to the line number of the file most recently read