That code is for Macintosh Common Lisp (MCL). It will…

Question

0

Asked: May 10, 20262026-05-10T23:31:38+00:00 2026-05-10T23:31:38+00:00

I’ve got a C++ program that’s likely to generate a HUGE amount of data

0

I’ve got a C++ program that’s likely to generate a HUGE amount of data — billions of binary records of varying sizes, most probably less than 256 bytes but a few stretching to several K. Most of the records will seldom be looked at by the program after they’re created, but some will be accessed and modified regularly. There’s no way to tell which are which when they’re created.

Considering the volume of data, there’s no way I can store it all in memory. But as the data only needs to be indexed and accessed by its number (a 64-bit integer), I don’t want the overhead of a full-fledged database program. Ideally I’d like to treat it as an std::map with its data stored on disk until requested.

Is there an already-written library that will do what I’m looking for, or do I need to write it myself?

EDIT: After some thought, I realized that Rob Walker’s answer had a valid point: I’d be hard-pressed to get anywhere near the same kind of data integrity out of a home-brew class that I’d get from a real database.

Although BerkeleyDB (as suggested by RHM) looks like it would do exactly what we’re looking for, the dual-licensing is a headache that we don’t want to deal with. When we’re done with the code and can prove that it would benefit noticeably from BerkeleyDB (which it probably would), we’ll reexamine the issue.

I did look at Ferruccio’s suggestion of stxxl, but I wasn’t able to tell how it would handle the program being interrupted and restarted (maybe with changes). With that much data, I’d hate to just scrap what it had already completed and start over every time, if some of the data could be saved.

So we’ve decided to use an SQLite database, at least for the initial development. Thanks to everyone who answered or voted.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T23:31:38+00:00

I doubt you will find a library that meets your requirements exactly, so you’ll have to decide on what ‘features’ are really important to you and then decide if an existing DB solution comes close enough.

Billions of records is a large dataset by any stretch. What rate are records generated at? How long do they persist? Does the access pattern change over time?

Are updates always with the same amount of data as the original?

I would suggest proving definitively that a DB solution isn’t going to work before starting to roll your own, particularly if integrity of the data is paramount (and it usually is…) Maintaining that volume of data on disk reliably can definitely be a challenge. Do you need any kind of transaction semantics when changing the data? Is the client multithreaded?

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions