I have many log files like this:
……
……
cpu time 9.05 seconds
real time 8:02.07
……
……
cpu time 2:25.23
real time 1:39:44.15
……
……
To get all the times, I simply grep all the cpu time and real time.
Then, sort the grep output files.
I am using AIX 5.2, there is sort by string or by numberic.
But, there is no sort by hour:minute:second.
To solve this problem, I pass the grep output lines to a while loop.
Then, create a new variables using sed ‘s/:/00/g’
This new var will make the hh:mm:ss.xx becomes hh00mm00ss.xx
and then sort by this new variable as numeric.
Using this way, I can find out the most time-consuming steps.
This work around can do but the speed is a little bit slow.
Can anyone have a better alternative ?
Thanks in advance.
Alvin SIU
In the paper ‘Theory and Practice in the Construction of a Working Sort Routine’, J P Linderman shows that the best way to get good performance out of the system
sortcommand (which is the ‘sort routine’ he was working on) with complex keys was to create commands to generate keys that make the comparisons simple. In the example, the sort command with the complex key was:The alternative mechanism used a key generator to make it easy to sort:
and the key generator was:
and the key stripper was:
For the test data Lindeman was working with, this reduced the elapsed time from around 2100 seconds for the elaborate sort command to about 600 seconds for the
awk | sort | awkcombination.Adopting that idea here, I’d use a Perl script to present the disparate time values uniformly in a format that
sortcan handle trivially.In this case, you seem to have a variety of time formats to worry about:
It is not clear whether you need to preserve the context of the lines you are sorting, but it seems to me that I’d convert the times to a canonical form. Do you need to allow for 3-digit hours of real time? If the time goes to 20.05 seconds, does the suffix remain? If the time goes to 80.05 seconds, is that printed as 1:20.05? I’m assuming yes…
Given the input data:
This generates the output data:
Which can be fed into a simple
sort, to yield:And from which the sort column can be stripped with ‘sed’ to yield:
So, given that the data file is ‘xx.data’ and the Perl script is xx.pl, the command line is: