I have a large array in RAM and want to read data from it as fast as possible. Ignore any possible synchronization, I only wonder about the theory.
Is it faster to spread those reads over several threads than just using one?
Edit: the data points are about 20KB each and I can’t predict in which order they are read.
Generally speaking: yes but beware of cache misses.
Let’s say that you have an int[]: consider partitioning it in ranges of subsequent elements and have each thread get a range of its own (thread1 get from 0 to 127, thread2 from 128 to 255, …).
When you read one element of the array, the processor core executing the load is most likely to load some of the successive elements of the array in its cache, because most of the times they are going to be needed right after (immagine for (int i =0;;i++) do(arra[i])): if you don’t partition your data in a coarse way, all this work is going to be wasted.
You can read more about this in the following articles from Joe Duffy:
Not strictly related: The ‘premature optimization is evil’ myth in particular the part “Understand the order of magnitude that matters”
As @Alex said, the general rule is that you have to always measure and never assume anything: efficient scalability via concurrency is a complex subject and requires a lot of deep understaing of the underlying architecture.