How big does a buffer need to be in Java before it’s worth reusing?
Or, put another way: I can repeatedly allocate, use, and discard byte[] objects OR run a pool to keep and reuse them. I might allocate a lot of small buffers that get discarded often, or a few big ones that’s don’t. At what size is is cheaper to pool them than to reallocate, and how do small allocations compare to big ones?
EDIT:
Ok, specific parameters. Say an Intel Core 2 Duo CPU, latest VM version for OS of choice. This questions isn’t as vague as it sounds… a little code and a graph could answer it.
EDIT2:
You’ve posted a lot of good general rules and discussions, but the question really asks for numbers. Post ’em (and code too)! Theory is great, but the proof is the numbers. It doesn’t matter if results vary some from system to system, I’m just looking for a rough estimate (order of magnitude). Nobody seems to know if the performance difference will be a factor of 1.1, 2, 10, or 100+, and this is something that matters. It is important for any Java code working with big arrays — networking, bioinformatics, etc.
Suggestions to get a good benchmark:
- Warm up code before running it in the benchmark. Methods should all be called at least
100010000 times to get full JIT optimization. - Make sure benchmarked methods run for at least
110 seconds and use System.nanotime if possible, to get accurate timings. - Run benchmark on a system that is only running minimal applications
- Run benchmark 3-5 times and report all times, so we see how consistent it is.
I know this is a vague and somewhat demanding question. I will check this question regularly, and answers will get comments and rated up consistently. Lazy answers will not (see below for criteria). If I don’t have any answers that are thorough, I’ll attach a bounty. I might anyway, to reward a really good answer with a little extra.
What I know (and don’t need repeated):
- Java memory allocation and GC are fast and getting faster.
- Object pooling used to be a good optimization, but now it hurts performance most of the time.
- Object pooling is “not usually a good idea unless objects are expensive to create.” Yadda yadda.
What I DON’T know:
- How fast should I expect memory allocations to run (MB/s) on a standard modern CPU?
- How does allocation size effect allocation rate?
- What’s the break-even point for number/size of allocations vs. re-use in a pool?
Routes to an ACCEPTED answer (the more the better):
- A recent whitepaper showing figures for allocation & GC on modern CPUs (recent as in last year or so, JVM 1.6 or later)
- Code for a concise and correct micro-benchmark I can run
- Explanation of how and why the allocations impact performance
- Real-world examples/anecdotes from testing this kind of optimization
The Context:
I’m working on a library adding LZF compression support to Java. This library extends the H2 DBMS LZF classes, by adding additional compression levels (more compression) and compatibility with the byte streams from the C LZF library. One of the things I’m thinking about is whether or not it’s worth trying to reuse the fixed-size buffers used to compress/decompress streams. The buffers may be ~8 kB, or ~32 kB, and in the original version they’re ~128 kB. Buffers may be allocated one or more times per stream. I’m trying to figure out how I want to handle buffers to get the best performance, with an eye toward potentially multithreading in the future.
Yes, the library WILL be released as open source if anyone is interested in using this.
If you want a simple answer, it is that there is no simple answer. No amount of calling answers (and by implication people) “lazy” is going to help.
At the speed at which the JVM can zero memory, assuming that the allocation does not trigger a garbage collection. If it does trigger garbage collection, it is impossible to predict without knowing what GC algorithm is used, the heap size and other parameters, and an analysis of the application’s working set of non-garbage objects over the lifetime of the app.
See above.
If you want a simple answer, it is that there is no simple answer.
The golden rule is, the bigger your heap is (up to the amount of physical memory available), the smaller the amortized cost of GC’ing a garbage object. With a fast copying garbage collector, the amortized cost of freeing a garbage object approaches zero as the heap gets larger. The cost of the GC is actually determined by (in simplistic terms) the number and size of non-garbage objects that the GC has to deal with.
Under the assumption that your heap is large, the lifecycle cost of allocating and GC’ing a large object (in one GC cycle) approaches the cost of zeroing the memory when the object is allocated.
EDIT: If all you want is some simple numbers, write a simple application that allocates and discards large buffers and run it on your machine with various GC and heap parameters and see what happens. But beware that this is not going to give you a realistic answer because real GC costs depend on an application’s non-garbage objects.
I’m not going to write a benchmark for you because I know that it would give you bogus answers.
EDIT 2: In response to the OP’s comments.
Theoretically yes. In practice, it is difficult to measure in a way that separates the allocation costs from the GC costs.
No, I’m saying it is likely to increase performance. Significantly. (Provided that you don’t run into OS-level virtual memory effects.)
Maybe. Frankly, I think that you are not going to get much improvement by recycling buffers.
But if you are intent on going down this path, create a buffer pool interface with two implementations. The first is a real thread-safe buffer pool that recycles buffers. The second is dummy pool which simply allocates a new buffer each time
allocis called, and treatsdisposeas a no-op. Finally, allow the application developer to choose between the pool implementations via asetBufferPoolmethod and/or constructor parameters and/or runtime configuration properties. The application should also be able to supply a buffer pool class / instance of its own making.