Suppose I have a c++ code with many small functions, in each of which

Question

0

Asked: May 30, 20262026-05-30T17:05:35+00:00 2026-05-30T17:05:35+00:00

Suppose I have a c++ code with many small functions, in each of which

0

Suppose I have a c++ code with many small functions, in each of which i will typically need a matrix float M1(n,p) with n,p known at run-time to contain the results of intermediate computations (no need to initialize M1, just to declare it because each function will just overwrite over all rows of M1).

Part of the reason for this is that each function works on an original data matrix that it can’t modify, so many operations (sorting, de-meaning, sphering) need to be done on “elsewhere”.

Is it better practice to create a temporary M1(n,p) within each function, or rather once and for all in the main() and pass it to each function as a sort of bucket that each function can use as scrap space?

n and p are often moderately large [10^2-10^4] for n and [5-100] for p.

(originally posted at the codereview stackexchange but moved here).

Best,

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T17:05:36+00:00

heap allocations are indeed, quite expensive.
premature optimization is bad, but if your library is quite general and the matrices are huge, it may not be premature to seek an efficient design. After all, you don’t want to modify your design after you accumulated many dependencies to it.
there are various levels in which you can tackle this problem. You can, for example, avoid the heap allocation expense by tackling it at the memory allocator level (per-thread memory pool, e.g.)
while heap allocation is expensive, you are creating one giant matrix only to do some rather expensive operations on the matrices (typically linear complexity or worse). Relatively speaking, allocating a matrix on the free store may not be that expensive compared to what you are inevitably going to have to do with it subsequently, so it may actually be quite cheap in comparison to the overall logic of a function like sorting.

I recommend you write the code naturally, taking into account #3 as a future possibility. That is, don’t take in references to matrix buffers for intermediary computations to accelerate the creation of temporaries. Make the temporaries and return them by value. Correctness and good, clear interfaces come first.

Mostly the goal here is to separate the creational policy of a matrix (via allocator or other means) which gives you that breathing room to optimize as an afterthought without changing too much existing code. If you can do it by modifying only the implementation details of the functions involved or, better yet, modifying only the implementation of your matrix class, then you’re really well off because then you’re free to optimize without changing the design, and any design which allows that is generally going to be complete from an efficiency standpoint.

WARNING: The following is only intended if you really want to squeeze the most out of every cycle. It is essential to understand #4 and also get yourself a good profiler. It’s also worth noting that you’ll probably do better by optimizing memory access patterns for these matrix algorithms than trying to optimize away the heap allocation.

If you need to optimize the memory allocation, consider optimizing it with something general like a per-thread memory pool. You could make your matrix take in an optional allocator, for instance, but I emphasize optional here and I’d also emphasize correctness first with a trivial allocator implementation.

In other words:

Is it better practice to declare M1(n,p) within each function, or
rather once and for all in the main() and pass it to each function as
a sort of bucket that each function can use as scrap space.

Go ahead and create M1 as a temporary in each function. Try to avoid requiring the client to make some matrix that has no meaning to him/her only to compute intermediary results. That would be exposing an optimization detail which is something we should strive not to do when designing interfaces (hide all details that clients should not have to know about).

Instead, focus on more general concepts if you absolutely want that option to accelerate the creation of these temporaries, like an optional allocator. This fits in with practical designs like with std::set:

std::set<int, std::less<int>, MyFastAllocator<int>> s; // <-- okay

Even though most people just do:

std::set<int> s;

In your case, it might simply be:
M1 my_matrix(n, p, alloc);

It’s a subtle difference, but an allocator is a much more general concept we can use than a cached matrix which otherwise has no meaning to the client except that it’s some kind of cache that your functions require to help them compute results faster. Note that it doesn’t have to be a general allocator. It could just be your preallocated matrix buffer passed in to a matrix constructor, but conceptually it might be good to separate it out merely for the fact that it is something a bit more opaque to clients.

Additionally, constructing this temporary matrix object would also require care not to share it across threads. That’s another reason you probably want to generalize the concept a bit if you do go the optimization route, as something a bit more general like a matrix allocator can take into account thread safety or at the very least emphasize more by design that a separate allocator should be created per thread, but a raw matrix object probably cannot.

The above is only useful if you really care about the quality of your interfaces first and foremost. If not, I’d recommend going with Matthieu’s advice as it is much simpler than creating an allocator, but both of us emphasize making the accelerated version optional.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose I have a c++ code with many small functions, in each of which

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply