In my C++ project, I have a large, GB binary file on disk that

Question

0

Asked: May 26, 20262026-05-26T11:42:24+00:00 2026-05-26T11:42:24+00:00

In my C++ project, I have a large, GB binary file on disk that

0

In my C++ project, I have a large, GB binary file on disk that I read into memory for read-only calculations.

My current C++ implementation involves reading the entire chunk into memory once and then spawning threads to read from the chunk in order to do various calculations (mutex-free and runs quickly). Technically, each thread really only needs a small part of the file at a time, so in the future, I may change this implementation to use mmap(), especially if the file gets too big. I’ve noticed this gommap lib so I think I should be covered going forward.

What approach should I take to translate my current C++ threading model (one large chunk of read-only memory) into a go threading model, keeping run-time efficiency in mind?

goroutines? alternatives?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T11:42:25+00:00

I’m sure this answer will cop a lot of heat but here goes:

You won’t get reduced running time by switching to Go, especially if your code is already mutex free. Go doesn’t guarantee efficient balancing of goroutines, and will not currently make best use of the available cores. The generated code is slower than C++. Go’s current strengths are in clean abstractions, and concurrency, not parallelism.

Reading the entire file up front isn’t particular efficient if you then have to go and backtrack through memory. Parts of the file you won’t use again until much later will be dropped from the cache, only to be reloaded again later. You should consider memory mapping if your platform will allow it, so that pages are loaded from disk as they’re required.

If there is any intense inter-routine communication, or dependencies between the data you should try to make the algorithm single threaded. It’s difficult to say without knowing more about the routines you’re applying to the data, but it does sound possible that you’ve pulled out threads prematurely in the hope to get a magic performance boost.

If you’re unable to rely on memory mapping due to file size, or other platform constraints, you should consider making use of the pread call, thereby reusing a single file descriptor, and only reading as required.

As always, the following rule applies to optimization. You must profile. You must check that changes you make from a working solution, are improving things. Very often you’ll find that memory mapping, threading and other shenanigans have no noticeable effect on performance whatsoever. It’s also an uphill battle if you’re switching away from C or C++.

Also, you should spawn goroutines to handle each part of the file, and reduce the results of the calculations through a channel. Make sure to set GOMAXPROCS to an appropriate value.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In my C++ project, I have a large, GB binary file on disk that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply