i’m working on a file i have scraped from a website, the file is saved as a semicolon csv with quoted fields.
The last field contains embedded newlines.
I’ve been working on a script to proces the file.
I’m fairly new to perl and at first is was trying it with a normal perl script but quickly found out that wasn’t working.
I did my research and found out I should use the Text::CSV module instead. I came across these sites which explained how to use the module:
http://perlmaven.com/how-to-read-a-csv-file-using-perl
http://perlmeme.org/tutorials/parsing_csv.html
http://metacpan.org/pod/Text::CSV#Embedded-newlines
Basically what i’m trying to accomplish is to read the file correctly so that all the fields get delimited properly instead of breaking off at a newline. Then removing the newlines from that field and write it to a new file.
Here is an example of the original data:
"2030";"NH Amersfoort";"Stationsstraat 75";"3811 MH AMERSFOORT";"033-4221200";"www.nh-hotels.nl";"52.154316";"5.380036";"<UL class=stars><LI>
<LI>
<LI>
<LI></LI></UL>"
"2031";"NH Amsterdam Centre";"Stadhouderskade 7";"1054 ES AMSTERDAM";"020-6851351";"www.nh-hotels.com";"52.363075";"4.879458";"<UL class=stars><LI>
<LI>
<LI>
<LI></LI></UL>"
"2032";"NH Atlanta Rotterdam Hotel";"Aert van Nesstraat 4";"3012 CA ROTTERDAM";"010-2067800";"www.nh-hotels.com";"51.921028";"4.478619";"<UL class=stars><LI>
<LI>
<LI>
<LI></LI></UL>"
And what i want is this:
"2030";"NH Amersfoort";"Stationsstraat 75";"3811 MH AMERSFOORT";"033-4221200";"www.nh-hotels.nl";"52.154316";"5.380036";"<UL class=stars><LI><LI><LI><LI></LI></UL>"
"2031";"NH Amsterdam Centre";"Stadhouderskade 7";"1054 ES AMSTERDAM";"020-6851351";"www.nh-hotels.com";"52.363075";"4.879458";"<UL class=stars><LI><LI><LI><LI></LI></UL>"
"2032";"NH Atlanta Rotterdam Hotel";"Aert van Nesstraat 4";"3012 CA ROTTERDAM";"010-2067800";"www.nh-hotels.com";"51.921028";"4.478619";"<UL class=stars><LI><LI><LI><LI></LI></UL>"
This is my full script so far. I have tried 10 different options and suggestions and they’re all not working!
use strict;
use warnings;
use Text::CSV;
my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";
open my $infile, '<', $inputfile or die "Sourcefile in use / not found :$!\n";
open my $outfile, '>', $outputfile or die "Outputfile in use :$!\n";
my $csv = Text::CSV->new ({
binary => 1,
sep_char => ';'
});
while (my $elements = $csv->getline( $infile )) {
my $stars = $elements->[8];
#$ster =~ s/[\r\n]//g
print "$stars\n\n";
}
close $infile;
close $outfile;
This prints the field with the newlines in it correctly but hasn’t removed them off course. How do i do that? Using a regex to substitute the newlines is not working. And the next question is when I do figure out how to clean up that field.. How do i print the new file?
I’m not sure what you are asking here, because it seems you already have your answers. However, this code does work:
Pointers:
Using the
eoloption withText::CSV‘s print makes it do what you expect, which is to print newlines. I usedSTDOUTas the output handle, but you can use any file handle you want.I don’t know why you say substitution does “not work” for you, but I suspect that perhaps you did something like this:
This does not change the values in
$row, just the copy in$foo.