I’m attempting to convert dates from one format to another:
From e.g. “October 29, 2005” to 2005-10-29.
I have a list of 625 dates. I use Awk.
The conversion works — most of the time.
Hovewer, sometimes the conversion won’t happen at all,
and the variable supposed to hold the (converted) date remains
undefined.
This always happens with the exact same rows.
Running `date’ explicitly (from the Bash shell) on the dates
of those weird rows works fine (the dates are properly converted).
— It’s not the textual contents of those rows that matters.
Why this behavior, and how can I fix my script?
Her it is:
awk 'BEGIN { FS = "unused" } {
x = "undefined";
"date \"+%Y-%m-%d\" -d " $1 | getline x ;
print $1 " = " x
}' uBXr0r15.txt \
> bug-out-3.txt
If you want to reproduce this problem:
- Download this file: uBXr0r15.txt.
- Run the Awk skript.
- Search for “undefined” in bug-out-3.txt.
(“undefined” found 122 times, on my computer.)
Then you could run the script again,
and (on my computer) bug-out-3.txt remains
unchanged — exactly the same dates are left undefined.
(Gawk version 3.1.6, Ubuntu 9.10.)
Kind regards, Magnus
Whenever you open a pipe or file for reading or writing in
awk, the latter will first check (using an internal hash) whether it already has a pipe or file with the same name (still) open; if so, it will reuse the existing file descriptor instead of reopening the pipe or file.In your case, all entries which end up as
undefinedare actually duplicates; the first time that they are encountered (i.e. when the corresponding commanddate "..." -d "..."is first issued) the proper result is read intox. On subsequent occurrences of the same date,getlineattempts to read a second, third etc. lines from the originaldatepipe, even though the pipe has been closed bydate, resulting inxno longer being assigned.From the
gawkman-page:You should explicitly
closethe pipe every time after you have readx:Incidentally, would it be OK to
sortanduniquBXr0r15.txtbefore piping intoawk, or do you need the original ordering/duplication?