Suppose I have a c++ code with many small functions, in each of which i will typically need a matrix float M1(n,p) with n,p known at run-time to contain the results of intermediate computations (no need to initialize M1, just to declare it because each function will just overwrite over all rows of M1).
Part of the reason for this is that each function works on an original data matrix that it can’t modify, so many operations (sorting, de-meaning, sphering) need to be done on “elsewhere”.
Is it better practice to create a temporary M1(n,p) within each function, or rather once and for all in the main() and pass it to each function as a sort of bucket that each function can use as scrap space?
n and p are often moderately large [10^2-10^4] for n and [5-100] for p.
(originally posted at the codereview stackexchange but moved here).
Best,
I recommend you write the code naturally, taking into account #3 as a future possibility. That is, don’t take in references to matrix buffers for intermediary computations to accelerate the creation of temporaries. Make the temporaries and return them by value. Correctness and good, clear interfaces come first.
Mostly the goal here is to separate the creational policy of a matrix (via allocator or other means) which gives you that breathing room to optimize as an afterthought without changing too much existing code. If you can do it by modifying only the implementation details of the functions involved or, better yet, modifying only the implementation of your matrix class, then you’re really well off because then you’re free to optimize without changing the design, and any design which allows that is generally going to be complete from an efficiency standpoint.
WARNING: The following is only intended if you really want to squeeze the most out of every cycle. It is essential to understand #4 and also get yourself a good profiler. It’s also worth noting that you’ll probably do better by optimizing memory access patterns for these matrix algorithms than trying to optimize away the heap allocation.
If you need to optimize the memory allocation, consider optimizing it with something general like a per-thread memory pool. You could make your matrix take in an optional allocator, for instance, but I emphasize optional here and I’d also emphasize correctness first with a trivial allocator implementation.
In other words:
Go ahead and create M1 as a temporary in each function. Try to avoid requiring the client to make some matrix that has no meaning to him/her only to compute intermediary results. That would be exposing an optimization detail which is something we should strive not to do when designing interfaces (hide all details that clients should not have to know about).
Instead, focus on more general concepts if you absolutely want that option to accelerate the creation of these temporaries, like an optional allocator. This fits in with practical designs like with
std::set:Even though most people just do:
In your case, it might simply be:
M1 my_matrix(n, p, alloc);
It’s a subtle difference, but an allocator is a much more general concept we can use than a cached matrix which otherwise has no meaning to the client except that it’s some kind of cache that your functions require to help them compute results faster. Note that it doesn’t have to be a general allocator. It could just be your preallocated matrix buffer passed in to a matrix constructor, but conceptually it might be good to separate it out merely for the fact that it is something a bit more opaque to clients.
Additionally, constructing this temporary matrix object would also require care not to share it across threads. That’s another reason you probably want to generalize the concept a bit if you do go the optimization route, as something a bit more general like a matrix allocator can take into account thread safety or at the very least emphasize more by design that a separate allocator should be created per thread, but a raw matrix object probably cannot.
The above is only useful if you really care about the quality of your interfaces first and foremost. If not, I’d recommend going with Matthieu’s advice as it is much simpler than creating an allocator, but both of us emphasize making the accelerated version optional.