Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8393159
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T19:42:38+00:00 2026-06-09T19:42:38+00:00

I parallelized a Java program. On a Mac with 4 cores, below is the

  • 0

I parallelized a Java program. On a Mac with 4 cores, below is the time for different number of threads.

threads #   1         2          4           8          16
time 2597192200 1915988600  2086557400  2043377000  1931178200

On a Linux server with two sockets, each with 4 cores, below is the measured time.

threads #   1         2          4           8          16 
time 4204436859 2760602109  1850708620  2370905549  2422668438

As you seen, the speedup is far away from linear speedup. There is almost no parallelization overhead in this case, like synchronization, or I/O dependencies.

I have two questions:

  1. Do these data imply this Java program is memory-bound ?
  2. If so, is there any way to further improve the performance without changing the hardware?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T19:42:39+00:00Added an answer on June 9, 2026 at 7:42 pm

    Answering the Title Question

    Amdahl’s Law explains that the speed-up obtained parallelizing a program depends on how much of the program is parallelizable.

    And we must also add in the overhead for coordinating the parallelism.

    So we consider what percent/parts of the program is/are parallelizable, and what overhead (synchronization, communication, false sharing, etc.) is incurred.

    Is Reading Memory Parallelizable?

    From hard drive

    You can read from 2 different hard disk drives at the same time without a slow down.

    But, usually parallelism does not provide a speed-up to reading from a hard drive.

    Hard disk drives (i.e. drives with a spinning disk) have been optimized to read sequentially, and jumping around between memory locations will slow down the overall memory transfer.

    Solid state drives are actually quite good at randomly accessing data, jumping here and there in memory, so with solid state drives keeping the read/write queue full is a good idea.

    From RAM and Cache

    Understanding the idea of a cache-line will help avoid false-sharing.

    This type of memory operation can be parallelized effectively, such as iterating over an array by dividing it into four partitions.

    Your Question

    I’m assuming that your times are in nano-seconds, so on computer 1, the program took 2.5 secs and then leveled off to about 2 seconds, with a peak of a 1.9 seconds.

    I am hoping that you had minimal background programs running at the same time, and that you performed these tests a few times to get rid of irregularities.

    Also, irregularities could come up in timing due to the Just In Time compiling (JIT) of the Java virtual machine, so to accurately time, you want to run the code in a loop a few times, and store the time of the last iteration. (or pre-compile to native code).

    Also, since the first time the program is run, much of the data that was used from hard drive would be moved into the cache, so later executions should be faster. (So either use a timing from the last run after looping to ensure the memory is in cache, or use the first timing but power off and on the computer between timings).

    Is the program Memory Bound?

    Based only on your timings, this is hard to say.

    The first computer took 2.5 seconds, then had a 20% speed-up with 2 threads, but then stayed at about 2.0 seconds.

    By itself, this speedup could just have been the results of the JIT and the cache memory being filled by the timing on 1 thread. After that, any differences in run-time might just be noise.

    The second computer took 4.2 seconds, then 2.8, then 1.9, then back to about 2.3 seconds.

    This one does seem to demonstrate some type of a speed-up with parallelism, but some time of contention occurs (memory, cache-lines, synchronization, or etc.) as demonstrated by the increase in time form 4 threads to 8 threads.

    Any way to improve performance?

    Use a profiler on your code, determine what parts of your code are taking up the most time.

    (You can simulate a profiler, by debugging your code and breaking and see where the program is. Repeat that 10 times, to see if there is one part that is proportionally more stopped at than another.)

    Use better algorithms or Arrange the data in memory (data structures) in a better way for the problem.

    Exploit more parallelism in the problem.

    Try to make hard drive memory reads sequential. Maybe have just one thread with reads from the hard drive and then puts the data in a concurrent queue to be operated on by the other threads.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Below is a portion of code parallelized via openMP. Arrays, ap[] and sc[] ,
The following piece of code runs fine when parallelized to 4-5 threads, but starts
I have an OpenMP parallelized program that looks like that: [...] #pragma omp parallel
I need to write a program in Java which will read a relatively large
What are the reasons a parallelized program doesn't achieve the ideal speedup? For example,
I have a program written in Java which involves massive amount of multidimensional array.
What is the rough cost of using threads in java? Are the any rule
I have a section of a Fortran90 program that should be parallelized with OpenMP.
Input the following little sequential program and its parallelized version in Scala REPL: /*
Suppose I write a program using immutable data structures in Java. Even though it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.