I have 2 data files: file01 and file02 . In both data sets fields

Question

0

Asked: June 3, 20262026-06-03T16:54:41+00:00 2026-06-03T16:54:41+00:00

I have 2 data files: file01 and file02 . In both data sets fields

0

I have 2 data files: file01 and file02. In both data sets fields are: (i) an identificator; (ii) a numeric reference; (iii) longitude; and (iv) latitude.
For each row in file01, I want to seach the data in file02 with the same numeric reference and then, find the identificator in file02 which is nearest to the identificator in file01.

I can get this if I pass manually the values from file01 to the awk program using the following code:

awk 'function acos(x) { return atan2(sqrt(1-x*x), x) }
BEGIN {pi=3.14159;
       ndist=999999999.1;
       date=1001;
       lo1=-1.20; lg1=lo1*(pi/180);
       la1=30.31; lt1=la1*(pi/180)
           }
{if($2==date) {ws=$1;
               lg2=$3*(pi/180);
               lt2=$4*(pi/180);
               dist= 6378.7 * acos( sin(lt1)*sin(lt2) + cos(lt1)*cos(lt2)*cos(lg2-lg1) );
               if(dist < ndist) {ndist=dist; ws0=ws}}}
END {print(ws0,ndist)}' file02

As you see, date, lo1 and la1 in the BEGIN statement are the values in the 1st row of file01 (see below for data files). My question is if I could do that at once, so each time I read a row in file01, I get the nearest identificator and the distance and append to the row data in file01. I do not know if some shell command could achieve this in a easier way, maybe using a pipe.

An example of these two data files and the desired output are:

=== file01 ===

A 1001 -1.2 30.31
A 1002 -1.2 30.31
B 1002 -1.8 30.82
B 1003 -1.8 30.82
C 1001 -2.1 28.55

=== file02 ===

ws1 1000 -1.3 29.01
ws1 1001 -1.3 29.01
ws1 1002 -1.3 29.01
ws1 1003 -1.3 29.01
ws1 1004 -1.3 29.01
ws1 1005 -1.3 29.01
ws2 1000 -1.5 30.12
ws2 1002 -1.5 30.12
ws2 1003 -1.5 30.12
ws2 1004 -1.5 30.12
ws2 1005 -1.5 30.12
ws3 1000 -1.7 29.55
ws3 1001 -1.7 29.55
ws3 1002 -1.7 29.55
ws3 1003 -1.7 29.55
ws3 1004 -1.7 29.55
ws3 1005 -1.7 29.55
ws4 1000 -1.9 30.33
ws4 1001 -1.9 30.33
ws4 1002 -1.9 30.33
ws4 1003 -1.9 30.33
ws4 1004 -1.9 30.33
ws4 1005 -1.9 30.33

=== output file ===

A 1001 -1.2 30.31 ws4 67.308
A 1002 -1.2 30.31 ws2 35.783
B 1002 -1.8 30.82 ws4 55.387
B 1003 -1.8 30.82 ws4 55.387
C 1001 -2.1 28.55 ws1 85.369

EDIT #1: Considering the suggestion by @Eran, I wrote the following code:

join -j 2 < (sort -k 2,2 file01) < (sort -k 2,2 file02) |
awk 'function acos(x) { return atan2(sqrt(1-x*x), x) }
     BEGIN {pi=3.14159}

     {if (last != $1 $2)
         {print NR, id,r,lon,lat,ws0,ndist;
          last = $1 $2;
          ndist=999999999.1

         } else {

          lg1=$3*(pi/180);
          lt1=$4*(pi/180);
          lg2=$6*(pi/180);
          lt2=$7*(pi/180);
          dist= 6378.7 * acos( sin(lt1)*sin(lt2) + cos(lt1)*cos(lt2)*cos(lg2-lg1) );
          if(dist< ndist) {ndist=dist; ws0=$5}
          id=$2;r=$1;lon=$3;lat=$4

          }
     }'

The output from this script is:

1      
4  A 1001 -1.2 30.31 ws4 67.3078
7  C 1001 -2.0 28.55 ws3 115.094
11 A 1002 -1.2 30.31 ws2 35.7827
15 B 1002 -1.8 30.82 ws4 55.387

EDIT #2: Using athe suggestion of @Dennis (with some modifications) I have got the desired output. The awk script is as follows:


awk 'function acos(x) { return atan2(sqrt(1-x*x), x) }
     BEGIN {pi=3.14159}
     NR==FNR {c++; a1[c]=$1;a2[c]=$2;a3[c]=$3;a4[c]=$4; next}
             {d++; b1[d]=$1;b2[d]=$2;b3[d]=$3;b4[d]=$4}

     END {
     for(k=1;k<=c;k++) {
         lg1=a3[k]*(pi/180);
         lt1=a4[k]*(pi/180);
         ndist=999999999.1;
         for(l=1;l<=d;l++) {
             if(b2[l]==a2[k]) {kk=b2[l];
                lg2=b3[l]*(pi/180);
                lt2=b4[l]*(pi/180);
                dist= 6378.7 * acos( sin(lt1)*sin(lt2) + cos(lt1)*cos(lt2)*cos(lg2-lg1) );
                if(dist<ndist) {ndist=dist; ws0=b1[l]}
             }
         }
         print a1[k],a2[k],a3[k],a4[k],ws0,ndist
     }
    }' file01 file02

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T16:54:42+00:00

Read your values from file01 into one or more arrays. You can use getline in the BEGIN block or the canonical way is to use a FNR == NR loop as one of the main blocks.

FNR == NR {array[$1] = $1; ...; next } # read file01 into some arrays
{ for (item in array) { ... }     # process each item in the array(s) against each line in file02

Your script would be invoked as awk '...' file01 file02

Instead of indexing the arrays by field values, you could index them with a counter array1[c] = $1; array2[c] = $2; c++ and iterate with a counter instead of using in: for (i=0; i<c; i++).

Of course, you should choose meaningful array names.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have 2 data files: file01 and file02 . In both data sets fields

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply