Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7127975
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T11:05:22+00:00 2026-05-28T11:05:22+00:00

Say I have two log files ( input.log and output.log ) with the following

  • 0

Say I have two log files (input.log and output.log) with the following format:

2012-01-16T12:00:00 12345678

The first field is the processing timestamp and the second is a unique ID. I’m trying to find:

  1. The records from input.log which don’t have a corresponding record for that ID in output.log
  2. The records from input.log which have a record for that ID, but the difference in the timestamps exceeds 5 seconds

I have a workaround solution with MySQL, but I’d ideally like to remove the database component and handle it with a shell script.

I have the following, which returns the lines of input.log with an added column if output.log contains the ID:

join -a1 -j2 -o 0 1.1 2.1 <(sort -k2,2 input.log) <(sort -k2,2 output.log)

Example output:

10111 2012-01-16T10:00:00 2012-01-16T10:00:04
11562 2012-01-16T11:00:00 2012-01-16T11:00:10
97554 2012-01-16T09:00:00

Main question:

Now that I have this information, how can I go about computing the differences between the 2 timestamps and discarding those over 5 seconds apart? I hit some problems processing the ISO 8601 timestamp with date (specifically the T) and assumed there must be a better way.

Edit: GNU coreutils supports ISO 8601 since late 2011, not long after this question was asked. This is likely no longer an issue for anyone. See this answer

Secondary question:

Is there perhaps a way to rework the entire approach, for instance into a single awk script? My knowledge of processing multiple files and setting up the correct inequalities for the output conditions was the limiting factor here, hence the approach above.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T11:05:23+00:00Added an answer on May 28, 2026 at 11:05 am

    If you have GNU awk, then you can try something like this –

    gawk '
    NR==FNR{a[$2]=$1;next} 
    !($2 in a) {print $2,$1; next} 
    ($2 in a) {
      "date +%s -d " $1 | getline var1;
      "date +%s -d " a[$2] | getline var2;
      var3 = var2 - var1;
      if (var3 > 4) print $2, $1, a[$2]
    }' output.log input.log
    

    Test:

    [jaypal:~/Temp] cat input.log 
    2012-01-16T09:00:00 9
    2012-01-16T10:00:00 10
    2012-01-16T11:00:00 11
    
    [jaypal:~/Temp] cat output.log 
    2012-01-16T10:00:04 10
    2012-01-16T11:00:10 11
    2012-01-16T12:00:00 12
    
    [jaypal:~/Temp] gawk '
    NR==FNR{a[$2]=$1;next} 
    !($2 in a) {print $2,$1; next} 
    ($2 in a) {"date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2;var3=var2-var1;if (var3>4) print $2,$1,a[$2] }' output.log input.log
    9 2012-01-16T09:00:00
    11 2012-01-16T11:00:00 2012-01-16T11:00:10
    

    Explanation:

    • NR==FNR{a[$2]=$1;next}

    We start of by storing the first field in your output.log file in an array indexed on second field. We use next to prevent the other pattern{action} statements from running. Using NR==FNR allows us to slurp the output.log file completely.

    • !($2 in a) {print $2,$1; next}

    Once the output.log file is completed. We start with the input.log file. We check if any second field present in input.log file is not present in our array (i.e output.log file). If found we print it. We continue this action until we have printed out all of those fields.

    • ($2 in a) {"date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2; var3=var2-var1; if (var3 > 4) print $2,$1,a[$2] }

    In this we look for fields that are present in both files. When we find those fields, we need to put in our logic to calculate the difference. We use the system command to find the date. Now system command by default prints to STDOUT and we have no control over them. So we pipe the output and capture the output using awk getline function and store it in a variable (var1 and var2). Once both dates are stored in a variable we do the difference and store in var3, if var3 is found to be > 4, we print it in the format you desire.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two source files created in unix folder say file1.dat and file2.dat. I
I have two python scripts running as cronjobs . ScriptA processes log files and
Say I have two strings, String s1 = AbBaCca; String s2 = bac; I
Say I have two tables I want to join. Categories: id name ---------- 1
Say I have two lists: var list1 = new int[] {1, 2, 3}; var
Say I have two arrays, items and removeItems and I wanted any values found
Say we have two tables in an MS Access db: Service Users: | ID
Say I have two classes and have a requirement that the primary key property
Say I have two functions that expect ...rest parameters private function a(...myParams):void { trace(myParams.length);
Say I have two tables called A (fields: id, phase, name) and B(fields: id,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.