I have a case here that I would like to have some opinions from the experts 🙂
Situation:
- I have a data structure with ´Int32´ and ´Double´ values, with a total of 108 bytes.
- I have to process a large series of this data structure. Its something like (conceptual, I will use a for loop instead):
double result = 0;
foreach(Item item in series)
{
double += //some calculation based on item
}
- I expect the size of the series to be about 10 Mb.
- To be useful, the whole series must be processed. It’s all or nothing.
- The series data will never change.
My requirements:
- Memory consumption is not an issue. I think that nowadays, if the user doesn’t have a few dozen Mb free on his machine, he probably has a deeper problem.
- Speed is a concern. I want the iteration to be as fast as possible.
- No unmanaged code, or interop, or even unsafe.
What I would like to know
- Implement the item data structure as a value or reference type? From what I know, value types are cheaper, but I imagine that on each iteration a copy will be made for each item if I use a value type. Is this copy faster than a heap access?
- Any real problem if I implement the accessors as anonymous properties? I believe this will increase the footprint. But also that the getter will be inlined anyway. Can I safely assume this?
- I’m seriously considering to create a very large static readonly array of the series directly in code (it’s rather easy do this with the data source). This would give me a 10Mb assembly. Any reason why I should avoid this?
Hope someone can give me a good opinion on this.
Thanks
Code it both ways and profile it aggressively on real-world input. Then you’ll know exactly which one is faster.
Real problem? No.
You can only safely assume things guaranteed by the spec. It’s not guaranteed by the spec.
I think you’re probably worrying about this too much.
I’m sorry if my answer seems dismissive. You’re asking random people on the Internet to speculate which of two things is faster. We can guess, and we might be right, but you could just code it both ways in the blink of an eye and know exactly which is faster. So, just do it?
However, I always code for correctness, readability and maintainability at first. I establish reasonable performance requirements up front, and I see if my implementation meets them. If it does, I move on. If I need more performance from my application, I profile it to find the bottlenecks and then I start worrying.
You’re asking about a trivial computation that takes ~10,000,000 / 108 ~= 100,000 iterations. Is this even a bottleneck in your application? Seriously, you are overthinking this. Just code it and move on.