Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6367777
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T00:35:32+00:00 2026-05-25T00:35:32+00:00

I am currently dealing with log files with sizes approx. 5gb. I’m quite new

  • 0

I am currently dealing with log files with sizes approx. 5gb. I’m quite new to parsing log files and using UNIX bash, so I’ll try to be as precise as possible. While searching through log files, I do the following: provide the request number to look for, then optionally to provide the action as a secondary filter. A typical command looks like this:

fgrep '2064351200' example.log | fgrep 'action: example'

This is fine dealing with smaller files, but with a log file that is 5gb, it’s unbearably slow. I’ve read online it’s great to use sed or awk to improve performance (or possibly even combination of both), but I’m not sure how this is accomplished. For example, using awk, I have a typical command:

awk '/2064351200/ {print}' example.log

Basically my ultimate goal is to be able print/return the records (or line number) that contain the strings (could be up to 4-5, and I’ve read piping is bad) to match in a log file efficiently.

On a side note, in bash shell, if I want to use awk and do some processing, how is that achieved? For example:

BEGIN { print "File\tOwner" }
{ print $8, "\t", \
$3}
END { print " - DONE -" }

That is a pretty simple awk script, and I would assume there’s a way to put this into a one liner bash command? But I’m not sure how the structure is.

Thanks in advance for the help. Cheers.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T00:35:33+00:00Added an answer on May 25, 2026 at 12:35 am

    You need to perform some tests to find out where your bottlenecks are, and how fast your various tools perform. Try some tests like this:

    time fgrep '2064351200' example.log >/dev/null
    time egrep '2064351200' example.log >/dev/null
    time sed -e '/2064351200/!d' example.log >/dev/null
    time awk '/2064351200/ {print}' example.log >/dev/null
    

    Traditionally, egrep should be the fastest of the bunch (yes, faster than fgrep), but some modern implementations are adaptive and automatically switch to the most appropriate searching algorithm. If you have bmgrep (which uses the Boyer-Moore search algorithm), try that. Generally, sed and awk will be slower because they’re designed as more general-purpose text manipulation tools rather than being tuned for the specific job of searching. But it really depends on the implementation, and the correct way to find out is to run tests. Run them each several times so you don’t get messed up by things like caching and competing processes.

    As @Ron pointed out, your search process may be disk I/O bound. If you will be searching the same log file a number of times, it may be faster to compress the log file first; this makes it faster to read off disk, but then require more CPU time to process because it has to be decompressed first. Try something like this:

    compress -c example2.log >example2.log.Z
    time zgrep '2064351200' example2.log.Z >/dev/null
    gzip -c example2.log >example2.log.gz
    time zgrep '2064351200' example2.log.gz >/dev/null
    bzip2 -k example.log
    time bzgrep '2064351200' example.log.bz2 >/dev/null
    

    I just ran a quick test with a fairly compressible text file, and found that bzip2 compressed best, but then took far more CPU time to decompress, so the zgip option wound up being fastest overall. Your computer will have different disk and CPU performance than mine, so your results may be different. If you have any other compressors lying around, try them as well, and/or try different levels of gzip compression, etc.

    Speaking of preprocessing: if you’re searching the same log over and over, is there a way to preselect out just the log lines that you might be interested in? If so, grep them out into a smaller (maybe compressed) file, then search that instead of the whole thing. As with compression, you spend some extra time up front, but then each individual search runs faster.

    A note about piping: other things being equal, piping a huge file through multiple commands will be slower than having a single command do all the work. But all things are not equal here, and if using multiple commands in a pipe (which is what zgrep and bzgrep do) buys you better overall performance, go for it. Also, consider whether you’re actually passing all of the data through the entire pipe. In the example you gave, fgrep '2064351200' example.log | fgrep 'action: example', the first fgrep will discard most of the file; the pipe and second command only have to process the small fraction of the log that contains ‘2064351200’, so the slowdown will likely be negligible.

    tl;dr TEST ALL THE THINGS!

    EDIT: if the log file is “live” (i.e. new entries are being added), but the bulk of it is static, you may be able to use a partial preprocess approach: compress (& maybe prescan) the log, then when scanning use the compressed (&/prescanned) version plus a tail of the part of the log added since you did the prescan. Something like this:

    # Precompress:
    gzip -v -c example.log >example.log.gz
    compressedsize=$(gzip -l example.log.gz | awk '{if(NR==2) print $2}')
    
    # Search the compressed file + recent additions:
    { gzip -cdfq example.log.gz; tail -c +$compressedsize example.log; } | egrep '2064351200'
    

    If you’re going to be doing several related searches (e.g. a particular request, then specific actions with that request), you can save prescanned versions:

    # Prescan for a particular request (repeat for each request you'll be working with):
    gzip -cdfq example.log.gz | egrep '2064351200' > prescan-2064351200.log
    
    # Search the prescanned file + recent additions:
    { cat prescan-2064351200.log; tail -c +$compressedsize example.log | egrep '2064351200'; } | egrep 'action: example'
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Currently I am using HTML files for parts of my user interface. I display
I'm currently dealing with a class that's using a DelayQueue . I've noticed that
I am currently dealing with numbers with different bases. I am using the function
i´m currently dealing with a system where i have to track the state for
I am currently dealing with code purchased from a third party contractor. One struct
Currently I am dealing with a web application which uses a txt file as
currently, I`m implementing a map App with Mono4Droid and there I`m using a WebView
Currently I'm starting a new Activity and calling finish on a current one. Is
I am currently dealing with securing rtmp streams from Level3 CDN. The documentation can
I'm currently dealing with customs messages Beans in Java. After filling in the Beans

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.