Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 555173
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T11:47:15+00:00 2026-05-13T11:47:15+00:00

Consider the following shell script: gzip -dc in.gz | sed -e ‘s/@/_at_/g’ | gzip

  • 0

Consider the following shell script:

gzip -dc in.gz | sed -e 's/@/_at_/g' | gzip -c > out.gz 

This has three processes working in parallel to decompress a stream, modify it, and re-compress it. Running time I can see my user time is about twice that of my real time, which indicates the program is effectively working in parallel.

I’ve attempted to create the same program in Java by placing each task in it’s own thread. Unfortunately, the multithreaded Java program is only about 30% faster than the single threaded version for the above sample. I’ve tried using both an Exchanger and a ConcurrentLinkedQueue. The ConcurrentLinkedQueue linked queue causes a lot of contention, although all three threads are generally kept busy. The Exchanger has lower contention, but is more complicated, and the doesn’t seem to keep the slowest worker running 100% of the time.

I’m trying to figure out a pure Java solution to this problem without looking at one of the byte code weaving frameworks or a JNI based MPI.

Most of the concurrency research and APIs concern themselves with divide-and-conquer algorithms, giving each node work which is orthogonal and non-dependent on prior calculations. Another approach to concurrency is the pipeline approach, where each worker does some work and passes the data onto the next worker.

I’m not trying to find the most efficient way to sed a gzip’d file, but rather I’m looking at how to efficiently break down tasks in a pipeline, in order to reduce the runtime to that of the slowest task.

Current timings for a 10m line file are as follows:

Testing via shell

real    0m31.848s
user    0m58.946s
sys     0m1.694s

Testing SerialTest

real    0m59.997s
user    0m59.263s
sys     0m1.121s

Testing ParallelExchangerTest

real    0m41.573s
user    1m3.436s
sys     0m1.830s

Testing ConcurrentQueueTest

real    0m44.626s
user    1m24.231s
sys     0m10.856s

I’m offering a bounty for a 10% improvement in Java, as measured by real time on a four core system with 10m rows of test data. Current sources are available on Bitbucket.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T11:47:15+00:00Added an answer on May 13, 2026 at 11:47 am

    I individually verified the time taken, it seem like reading takes less than 10% of the time,and reading plus processing takes less than 30% of the whole time.
    So I took ParallelExchangerTest (best performer in your code) and modified it to
    just have 2 thread, first thread does reading & replace, and second thread does the writing.

    Here are the figures to compare (on my machine Intel dual core (not core2) running ubuntu with 1gb ram)

    > Testing via shell

    real 0m41.601s

    user 0m58.604s

    sys 0m1.032s

    > Testing ParallelExchangerTest

    real 1m55.424s

    user 2m14.160s

    sys 0m4.768s

    > ParallelExchangerTestMod (2 thread)

    real 1m35.524s

    user 1m55.319s

    sys 0m3.580s

    I knew that string processing takes longer time so I replace line.repalce
    with matcher.replaceAll, I got this figures

    > ParallelExchangerTestMod_Regex (2 thread)

    real 1m12.781s

    user 1m33.382s

    sys 0m2.916s

    Now I took a step ahead, instead of reading one line at a time, I read
    char[] buffer of various sizes and timed it, (with the regexp search/replace,)
    I got these figures

    > Testing ParallelExchangerTestMod_Regex_Buff (100 bytes processing at a time)

    real 1m13.804s

    user 1m32.494s

    sys 0m2.676s

    > Testing ParallelExchangerTestMod_Regex_Buff (500 bytes processing at time)

    real 1m6.286s

    user 1m29.334s

    sys 0m2.324s

    > Testing ParallelExchangerTestMod_Regex_Buff (800 bytes processing at time)

    real 1m12.309s

    user 1m33.910s

    sys 0m2.476s

    Looks like 500 bytes is optimal for the size of data.

    I forked and have a copy of my changes here

    https://bitbucket.org/chinmaya/java-concurrent_response/

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 273k
  • Answers 273k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer In the mapping file I have to put ns-prefix in… May 13, 2026 at 2:04 pm
  • Editorial Team
    Editorial Team added an answer First thing to check: IS THE INPUT DATA PROTOBUF DATA?… May 13, 2026 at 2:04 pm
  • Editorial Team
    Editorial Team added an answer You can have more then one project in your workspace… May 13, 2026 at 2:04 pm

Related Questions

Consider the following: me@mine:~$ cat a.sh #!/bin/bash echo Lines: $LINES echo Columns: $COLUMNS me@mine:~$
What are the best tools/programming-techniques for following a complicated nesting of symlinks and completely
I am looking for a way to launch multiple scripts in a separate process
I would like to confirm that the following analysis is correct: I am building
If yes, on which operating system, shell or whatever? Consider the following Java program

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.