I am trying to parse some csv files using awk. I am new to

Question

0

Asked: May 25, 20262026-05-25T02:39:16+00:00 2026-05-25T02:39:16+00:00

I am trying to parse some csv files using awk. I am new to

0

I am trying to parse some csv files using awk. I am new to shell scripting and awk.
The csv file i am working on looks something like this :

fnName,minAccessTime,maxAccessTime
getInfo,300,600
getStage,600,800
getStage,600,800
getInfo,250,620
getInfo,200,700
getStage,700,1000
getInfo,280,600

I need to find the average AccessTimes of the different functions.

I have been working with awk and have been able to get the average times provided the exact column numbers are specified like $2, $3 etc.

However I need to have a general script in which if i input “minAccessTime” in the command argument, I need the script to print the average AccessTime (instead of explicitly specifying $2 or $3 while using awk).

I have been googling about this and saw in various forums but none of them seems to work.
Can someone tell me how to do this ? It would be of great help !

Thanks in advance!!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T02:39:16+00:00

This awk script should give you all that you want.

It first evaluates which column you’re interested in by using the name passed in as the COLM variable and checking against the first line. It converts this into an index (it’s left as the default 0 if it couldn’t find the column).

It then basically runs through all other lines in your input file. On all these other lines (assuming you’ve specified a valid column), it updates the count, sum, minimum and maximum for both the overall data plus each individual function name.

The former is stored in count, sum, min and max. The latter are stored in associative arrays with similar names (with _arr appended).

Then, once all records are read, the END section outputs the information.

NR == 1 {
    for (i = 1; i <= NF; i++) {
        if ($i == COLM) {
            cidx = i;
        }
    }
}

NR > 1 {
    if (cidx > 0) {
        count++;
        sum += $cidx;
        if (count == 1) {
            min = $cidx;
            max = $cidx;
        } else {
            if ($cidx < min) { min = $cidx; }
            if ($cidx > max) { max = $cidx; }
        }

        count_arr[$1]++;
        sum_arr[$1] += $cidx;
        if (count_arr[$1] == 1) {
            min_arr[$1] = $cidx;
            max_arr[$1] = $cidx;
        } else {
            if ($cidx < min_arr[$1]) { min_arr[$1] = $cidx; }
            if ($cidx > max_arr[$1]) { max_arr[$1] = $cidx; }
        }
    }
}

END {
    if (cidx == 0) {
        print "Column '" COLM "' does not exist"
    } else {
        print "Overall:"
        print "   Total records = " count
        print "   Sum of column = " sum
        if (count > 0) {
            print "   Min of column = " min
            print "   Max of column = " max
            print "   Avg of column = " sum / count
        }
        for (task in count_arr) {
            print "Function " task ":"
            print "   Total records = " count_arr[task]
            print "   Sum of column = " sum_arr[task]
            print "   Min of column = " min_arr[task]
            print "   Max of column = " max_arr[task]
            print "   Avg of column = " sum_arr[task] / count_arr[task]
        }
    }
}

Storing that script into qq.awk and placing your sample data into qq.in, then running:

awk -F, -vCOLM=minAccessTime -f qq.awk qq.in

generates the following output, which I’m relatively certain will give you every possible piece of information you need:

Overall:
   Total records = 7
   Sum of column = 2930
   Min of column = 200
   Max of column = 700
   Avg of column = 418.571
Function getStage:
   Total records = 3
   Sum of column = 1900
   Min of column = 600
   Max of column = 700
   Avg of column = 633.333
Function getInfo:
   Total records = 4
   Sum of column = 1030
   Min of column = 200
   Max of column = 300
   Avg of column = 257.5

For `maxAccessTime, you get:

Overall:
   Total records = 7
   Sum of column = 5120
   Min of column = 600
   Max of column = 1000
   Avg of column = 731.429
Function getStage:
   Total records = 3
   Sum of column = 2600
   Min of column = 800
   Max of column = 1000
   Avg of column = 866.667
Function getInfo:
   Total records = 4
   Sum of column = 2520
   Min of column = 600
   Max of column = 700
   Avg of column = 630

And, for xyzzy (a non-existent column), you’ll see:

Column 'xyzzy' does not exist

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse some csv files using awk. I am new to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply