I am using the following Perl script to do some simple processing:
use strict;
my $file = "data-text";
open(FILE, "<$file") or die "Can't open $file: $!\n";
my @lines = <FILE>;
close FILE;
my @arrayA = (); my @arrayB=();
my $i = 0;
while($i < @lines) {
print $lines[$i], "\t", $lines[$i+1], "\n";
chomp($lines[$i]); chomp($lines[$i+1]); #The problem is here...
push @arrayA, \$lines[$i];
push @arrayB, \$lines[$i+1];
print $lines[$i], "\t", $lines[$i+1], "\n";
$i+=2;
}
As I indicated in the script, the problem is at the line chomp($lines[$i]); chomp($lines[$i+1]);. It seems if I use this line, the lines would be messed up.
What is wrong? Why is this?
chompdeletes a single\ncharacter from the end of a string.If the string ends with
\r\n(the Windows-style line ending),chompwill leave the\rin place. This would likely result in symptoms similar to what you’re seeing.EDIT:
Some background. Unix-like systems (including Linux) use a single line-feed character (
'\n') to mark the end of each line in a text file. Windows (and its predecessor MS-DOS) uses two characters, a carriage return and a line feed (\r\n).Many of Perl’s features are designed to work with text. Which means, quite reasonably, that Perl assumes by default that any text file it’s reading uses the native end-of-line representation of the underlying operating system.
A feature Perl inherited from C is that, when reading a line of text, the native end-of-line sequence, whatever it is, is translated to a single
'\n'character. (The reverse translation is done on output). This frees most programs from having to worry about how text is represented; it’s translated to and from a canonical internal form on input and output. (That form happens to match the Unix format, for historical reasons.)But that doesn’t help much if you need to deal with non-native text files. If you’re running in a Unix-like environment, but reading Windows-format text files, the
\rcharacters are going to look like part of the line. In particular,chompwon’t do anything special with them. And when you print a\rcharacter, it typically causes the cursor to move to the beginning of the current line without advancing to the next line. It’s a mess. (Cygwin is a rich source of such confusion; it’s a Unix-like environment, using Unix-style text files by default, but it runs under Windows with full visibility to the Windows file system. Are you using Cygwin?)See @BillRupert’s comment; he’s running under Windows with a Windows native implementation of Perl, so he doesn’t see the problem you’re having.
If you want to deal with non-native text files, you’ll need to do a little extra work. For example, when reading a line of text, rather than just
you might write:
And when writing text, you can do this:
But first you’ll need to decide whether you want to write the output with Windows-style or Unix-style line endings. It’s tricky.
(There’s probably a Perl module that makes this easier; anyone who knows of one, please mention it in a comment.)
Incidentally, the output you’re seeing isn’t the output your program is producing. If you filter your output through something that shows non-printable characters in printable form, you’ll see
\ror^Min your output. Use... | cat -Aor... | cat -vif your system has thecatcommand.If possible, you might consider translating your input before trying to read it.