I am writing a perl script to parse tab delimited data from standard input.
The script removes leading and trailing whitespace, blanks out any field that has the string “NULL” and re-formats date columns from “MMM DD YYYY HH:MM:SS:SSSAM” format to “YYYYMMDD” format.
Sample Input:
93092 Apr 1 2010 12:00:00:000AM 59668370.60702875
22341 Apr 1 2010 12:00:00:000AM 51309196.84639429
27844 Apr 1 2010 12:00:00:000AM NULL
150465 Apr 22 2010 12:00:00:000AM 19706190.97586569
119364 Jul 20 2010 12:00:00:000AM 16335977.41009162
Target Output:
93092|20100401|59668370.60702875
22341|20100401|51309196.84639429
27844|20100401|
150465|20100422|19706190.97586569
119364|20100720|16335977.41009162
The script takes an argument representing the column(s) which have dates that need conversion. In the sample above, I would invoke with “1” as the param, since the 2nd column is the date that needs conversion. More than one column would be represented by a comma separated list.
This is what I’ve been able to do so far.
#!/usr/bin/perl
my @date_cols = split(/,/, $ARGV[0]);
while (<STDIN>) {
my @fields = split(/\t/, $_, -1);
for (@fields) {
s/^\s+//;
s/\s+\z//;
s/^NULL\z//;
}
for (@fields[@date_cols]) {
##NEED HELP WITH DATE FORMATTING
}
print(join('|', @fields), "\n");
}
Using Time::Piece is simple and easily provides you with the date formmating. The
strptimefunction lets you define the pattern you want to work with; thestrftimefunction lets you produce the output format desired. Consider:A nice feature of this approach is that it doesn’t matter whether one or two spaces separate the month and day fields; the results are the same.