The example code is like this, it does the statistic of number of times the first column appears and sort the result.
{ dist[$1]+=1; }
END { for (i in dist) {
print i,dist[i] | "sort"
}
}
In my opinion, the process is like this:
(WORKFLOW A)
1) print all element in the dist, save them all to a buffer
2) get all elements in the buffer, and pipe them to sort function
But in the example above, the process looks like this:
(WORKFLOW B)
1) print one element in the dist, and then pipe it to sort function
2) process the next element in the dist, until no new element in dist
I was wondering why I shouldn’t place the sort like this:
{ dist[$1]+=1; }
END { for (i in dist) {
print i,dist[i]
}
| "sort"
}
Anyone has any idea about the reason? And how can I write the pipe if I want to do the work like WORKFLOW B?
Thanks!
The reason you can’t do it the second way is because
| "command"is part of the syntax of awk’sprintcommand, it can’t be used with arbitrary statements or statement groups. The same thing goes for> filename.The way it works is that the first time it encounters a redirection to a file or pipe, it opens that file/pipe, and keeps that descriptor open. Then every time you redirect to the same file/pipe, it sends the output to the corresponding descriptor.