Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8380155
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T16:22:58+00:00 2026-06-09T16:22:58+00:00

Problem Statement:- I need to search a particular String Pattern in around 10000 files

  • 0

Problem Statement:-

I need to search a particular String Pattern in around 10000 files and find the records in the files which contains that particular pattern. I can use grep here, but it is taking lots of time.

Below is the command I am using to search a particular string pattern after unzipping the dat.gz file

gzcat /data/newfolder/real-time-newdata/*_20120809_0_*.gz | grep 'b295ed051380a47a2f65fb75ff0d7aa7^]3^]-1'

If I simply count how many files are there after unzipping the above dat.gz file

gzcat /data/newfolder/real-time-newdata/*_20120809_0_*.gz | wc -l

I get around 10000 files. And I need to search the above string pattern in all these 10000 files and find out the records which contains the above String Pattern. And my above command is working fine but it is very very slow.

What is the best approach on this? Should we take 100 files at a time and search for the particular String Pattern in that 100 files parallelly.

Note:

I am running SunOS

bash-3.00$ uname -a
SunOS lvsaishdc3in0001 5.10 Generic_142901-02 i86pc i386 i86pc
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T16:22:59+00:00Added an answer on June 9, 2026 at 4:22 pm

    Do NOT run this in parallel!!!! That’s going to bounce the disk head all over the place, it will be much slower.

    Since you are reading an archive file there’s one way to get a substantial performance boost–don’t write the results of the decompression out. The ideal answer would be to decompress to a stream in memory, if that’s not viable then decompress to a ramdisk.

    In any case you do want some parallelism here–one thread should be obtaining the data and then handing it off to another that does the search. That way you will either be waiting on the disk or on the core doing the decompressing, you won’t waste any of that time doing the search.

    (Note that in case of the ramdisk you will want to aggressively read the files it wrote and then kill them so the ramdisk doesn’t fill up.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Problem statement: Find the right triangle that has integers for all sides and all
The problem is that I need to construct the following SELECT statement: SELECT c.Animal/Dog,
Problem statement We have one employer that wants to interview N people, and therefore
Problem statement: It is necessary for me to write a code, whether which before
I need help with an XSLT problem. I haven't had much time to search
I am using the liveTwitter plugin The problem is that I need to stop
Say, I have a collection of text files I need to process (e.g. search
I need to compute the peek mid element also the problem statement for implementing
Problem statement I have a worker thread that basically scans a folder, going into
Problem Statement is : Given 2 Dimensional array, print output for example If 4

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.