Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7922323
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T16:54:41+00:00 2026-06-03T16:54:41+00:00

I have 2 data files: file01 and file02 . In both data sets fields

  • 0

I have 2 data files: file01 and file02. In both data sets fields are: (i) an identificator; (ii) a numeric reference; (iii) longitude; and (iv) latitude.
For each row in file01, I want to seach the data in file02 with the same numeric reference and then, find the identificator in file02 which is nearest to the identificator in file01.

I can get this if I pass manually the values from file01 to the awk program using the following code:

awk 'function acos(x) { return atan2(sqrt(1-x*x), x) }
BEGIN {pi=3.14159;
       ndist=999999999.1;
       date=1001;
       lo1=-1.20; lg1=lo1*(pi/180);
       la1=30.31; lt1=la1*(pi/180)
           }
{if($2==date) {ws=$1;
               lg2=$3*(pi/180);
               lt2=$4*(pi/180);
               dist= 6378.7 * acos( sin(lt1)*sin(lt2) + cos(lt1)*cos(lt2)*cos(lg2-lg1) );
               if(dist < ndist) {ndist=dist; ws0=ws}}}
END {print(ws0,ndist)}' file02

As you see, date, lo1 and la1 in the BEGIN statement are the values in the 1st row of file01 (see below for data files). My question is if I could do that at once, so each time I read a row in file01, I get the nearest identificator and the distance and append to the row data in file01. I do not know if some shell command could achieve this in a easier way, maybe using a pipe.

An example of these two data files and the desired output are:

=== file01 ===

A 1001 -1.2 30.31
A 1002 -1.2 30.31
B 1002 -1.8 30.82
B 1003 -1.8 30.82
C 1001 -2.1 28.55

=== file02 ===

ws1 1000 -1.3 29.01
ws1 1001 -1.3 29.01
ws1 1002 -1.3 29.01
ws1 1003 -1.3 29.01
ws1 1004 -1.3 29.01
ws1 1005 -1.3 29.01
ws2 1000 -1.5 30.12
ws2 1002 -1.5 30.12
ws2 1003 -1.5 30.12
ws2 1004 -1.5 30.12
ws2 1005 -1.5 30.12
ws3 1000 -1.7 29.55
ws3 1001 -1.7 29.55
ws3 1002 -1.7 29.55
ws3 1003 -1.7 29.55
ws3 1004 -1.7 29.55
ws3 1005 -1.7 29.55
ws4 1000 -1.9 30.33
ws4 1001 -1.9 30.33
ws4 1002 -1.9 30.33
ws4 1003 -1.9 30.33
ws4 1004 -1.9 30.33
ws4 1005 -1.9 30.33

=== output file ===

A 1001 -1.2 30.31 ws4 67.308
A 1002 -1.2 30.31 ws2 35.783
B 1002 -1.8 30.82 ws4 55.387
B 1003 -1.8 30.82 ws4 55.387
C 1001 -2.1 28.55 ws1 85.369

EDIT #1: Considering the suggestion by @Eran, I wrote the following code:

join -j 2 < (sort -k 2,2 file01) < (sort -k 2,2 file02) |
awk 'function acos(x) { return atan2(sqrt(1-x*x), x) }
     BEGIN {pi=3.14159}

     {if (last != $1 $2)
         {print NR, id,r,lon,lat,ws0,ndist;
          last = $1 $2;
          ndist=999999999.1

         } else {

          lg1=$3*(pi/180);
          lt1=$4*(pi/180);
          lg2=$6*(pi/180);
          lt2=$7*(pi/180);
          dist= 6378.7 * acos( sin(lt1)*sin(lt2) + cos(lt1)*cos(lt2)*cos(lg2-lg1) );
          if(dist< ndist) {ndist=dist; ws0=$5}
          id=$2;r=$1;lon=$3;lat=$4

          }
     }'

The output from this script is:

1      
4  A 1001 -1.2 30.31 ws4 67.3078
7  C 1001 -2.0 28.55 ws3 115.094
11 A 1002 -1.2 30.31 ws2 35.7827
15 B 1002 -1.8 30.82 ws4 55.387

EDIT #2: Using athe suggestion of @Dennis (with some modifications) I have got the desired output. The awk script is as follows:


awk 'function acos(x) { return atan2(sqrt(1-x*x), x) }
     BEGIN {pi=3.14159}
     NR==FNR {c++; a1[c]=$1;a2[c]=$2;a3[c]=$3;a4[c]=$4; next}
             {d++; b1[d]=$1;b2[d]=$2;b3[d]=$3;b4[d]=$4}

     END {
     for(k=1;k<=c;k++) {
         lg1=a3[k]*(pi/180);
         lt1=a4[k]*(pi/180);
         ndist=999999999.1;
         for(l=1;l<=d;l++) {
             if(b2[l]==a2[k]) {kk=b2[l];
                lg2=b3[l]*(pi/180);
                lt2=b4[l]*(pi/180);
                dist= 6378.7 * acos( sin(lt1)*sin(lt2) + cos(lt1)*cos(lt2)*cos(lg2-lg1) );
                if(dist<ndist) {ndist=dist; ws0=b1[l]}
             }
         }
         print a1[k],a2[k],a3[k],a4[k],ws0,ndist
     }
    }' file01 file02
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T16:54:42+00:00Added an answer on June 3, 2026 at 4:54 pm

    Read your values from file01 into one or more arrays. You can use getline in the BEGIN block or the canonical way is to use a FNR == NR loop as one of the main blocks.

    FNR == NR {array[$1] = $1; ...; next } # read file01 into some arrays
    { for (item in array) { ... }     # process each item in the array(s) against each line in file02
    

    Your script would be invoked as awk '...' file01 file02

    Instead of indexing the arrays by field values, you could index them with a counter array1[c] = $1; array2[c] = $2; c++ and iterate with a counter instead of using in: for (i=0; i<c; i++).

    Of course, you should choose meaningful array names.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a bunch of csv files. Each has data from a different period:
I have large data in two files each with about two million (different) entries.
I have two files. file1 has the data like belowing containing only one column.
I have some data files that I need to install together with my application
I have data in Excel files that I am pulling out using the MS
I have some code which handles data files and reports an error when it
I have a number of large data files that I included in projects attributed
In Android, assuming that I have files in /data/data/package.name/, without knowing the names or
In my scenario, I have a program that analyzes data input files and produces
I have some xml data contained in three files (Database.xml, Participants.xml, and ConditionTokens.xml). I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.