I have got a problem i am trying to solve using ONLY awk.
I have a csv file in the structure:
Easting Northing Latitude Longitude Locality Name
Easting "Northing" "Latitude" "Longitude" "LocalityName"
364208 176288 51.48441 -2.51685 "Fishponds"
358596 172813 51.45278 -2.59726 "Bristol City Centre"
358886 177828 51.49789 -2.59367 "Southmead"
358839 177839 51.49798 -2.59435 "Southmead"
358980 177882 51.49838 -2.59232 "Southmead"
359009 177863 51.49821 -2.5919 "Southmead"
358839 177529 51.4952 -2.59431 "Southmead"
359475 168262 51.41192 -2.58409 "Hengrove Park"
358945 173526 51.45921 -2.59232 "Bristol"
358943 173525 51.4592 -2.59235 "Bristol"
358941 173524 51.45919 -2.59238 "Bristol"
358940 173523 51.45919 -2.59239 "Bristol"
358945 173528 51.45923 -2.59232 "Bristol"
358936 173520 51.45916 -2.59245 "Bristol"
358936 173521 51.45917 -2.59245 "Bristol"
358932 173516 51.45912 -2.5925 "Bristol"
etc… I am trying to write an awk script which will count each instance of Locality name and the print print this so an out put would be:
Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove park 1
Bristol 8
So far i have got this:
BEGIN { i = 0; state = 0; names[NR]; FS=","; }
{
#for each element in names array, check if already exists.
for(j=0;j<=i;j++)
{
if(names[j] == $5)
{
state = 1;
break;
}
}
# If the name doesnt already exist add to names array
if(state == 0)
{
names[i] = $5;
i++;
}
state = 0;
}
END {
for(x=0;x<=i;x++)
{
print names[x];
}
}
Which hopefully sorts the locations and removes the duplicates, but i still cannot think of a good way to count the instances of each location then list them back.
Simpler soulution:
First the input file separator is set to
", so the second field will be the location name. Skipping the first line (header). Utilizing an array (key is the second field) to count the occurrences. After the last line print the array’s keys, and values.