I asked previously how to correct errors in count data using awk, where the first column of my data is a number used to identify the sub-arena that’s being measured, and the second column is the count data from that sub-arena. The counting is automated and the program makes errors (indicated below with #), where it will occasionally ‘miscount’ because the animals that are being counted have moved outside the range of the specific sub-arena.
1 0
1 2
1 6
1 7
1 7
1 8
1 7 #
1 7 #
1 9
2 0
2 0
2 1
2 4
2 3 #
2 3 #
2 4
2 4
2 6
I’d like to correct the above like so:
1 0
1 2
1 6
1 7
1 7
1 8
1 8
1 8
1 9
2 0
2 0
2 1
2 4
2 4
2 4
2 4
2 4
2 6
The code that was kindly suggested didn’t include a for loop for correcting within the data for each arena (there are 20 total per file) and I’ve been trying to figure this out but am having an incredibly hard time, with syntax errors some times and illegal statement errors other times. I’d appreciate any hints as to why the following won’t work (sorry I’m such a newbie, this is one of the many iterations that I’ve tried and none of them are pretty):
awk 'i=1; i<=20; i++; $1=i {NR > 1 && $2 < p {$2 = p} {p = $2} 1}' infile > outfile
Rather than counting the lines, why not have another variable tracking the line number which resets
pif the line number increments:First the first position (
$1) is compared to the value in thelvariable (that defaults to 0). If it’s greater,lis set to$1, andpis reset to 0. Then the second position ($2) is compared top, and if it’s less set top. Finally,pis set to the value of the (possibly changed)$2. The final1just means “print”; otherwise the command would do all the processing but not print any of it.