I need a library which would help me to save and query data in

Question

0

Asked: June 12, 20262026-06-12T01:03:34+00:00 2026-06-12T01:03:34+00:00

I need a library which would help me to save and query data in

0

I need a library which would help me to save and query data in a condensed format (a mini DSL in essence) here’s a sample of what I want:

Update 1 – Please note, figures in the samples above are made small just to make is easier to follow the logic, the real figures are limited with c# long type capacity, ex:
1,18,28,29,39,18456789,18456790,18456792,184567896.

Sample Raw Data set: 1,2,3,8,11,12,13,14

Condensed Sample Data set:
1..3,8,11..14

What would be absolute nice to have is to be able to present 1,2,4,5,6,7,8,9,10 as 1..10-3.

Querying Sample Data set:

Query 1 (get range):
1..5 -> 1..3

Query 2 (check if the value exists)
?2 -> true

Query 3 (get multiple ranges and scalar values):
1..5,11..12,14 -> 1..3,11..12,14

I don’t want to develop it from scratch and would highly prefer to use something which already exists.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T01:03:36+00:00

Here are some ideas I’ve had over the days since I read your question. I can’t be sure any of them really apply to your use case but I hope you’ll find something useful here.

Storing your data compressed

Steps you can take to reduce the amount of space your numbers take up on disk:

If your values are between 1 and ~10M, don’t use a long, use a uint. (4 bytes per number.)
Actually, don’t use a uint. Store your numbers 7 bits to a byte, with the remaining bit used to say “there are more bytes in this number”. (Then 1-127 will fit in 1 byte, 128-~16k in 2 bytes, ~16k-~2M in 3 bytes, ~2M-~270M in 4 bytes.)

This should reduce your storage from 8 bytes per number (if you were originally storing them as longs) to, say, on average 3 bytes. Also, if you end up needing bigger numbers, the variable-byte storage will be able to hold them.

Then I can think of a couple of ways to reduce it further, given you know the numbers are always increasing and may contain lots of runs. Which works best for you only you can know by trying it on your actual data.

For each of your actual numbers, store two numbers: the number itself, followed by the number of numbers contiguous after it (e.g. 2,3,4,5,6 => 2,4). You’ll have to store lone numbers as e.g. 8,0 so will increase storage for those, but if your data has lots of runs (especially long ones) this should reduce storage on average. You could further store “single gaps” in runs as e.g. 1,2,3,5,6,7 => 1,6,4 (unambiguous as 4 is too small to be the start of the next run) but this will make processing more complex and won’t save much space so I wouldn’t bother.
Or, rather than storing the numbers themselves, store the deltas (so 3,4,5,7,8,9 => 3,1,1,2,1,1. This will reduce the number of bytes used for storing larger numbers (e.g. 15000,15005 (4 bytes) => 15000,5 (3 bytes)). Further, if the data contains a lot of runs (e.g. lots of 1 bytes), it will then compress (e.g. zip) nicely.

Handling in code

I’d simply advise you to write a couple of methods that stream a file from disk into an IEnumerable<uint> (or ulong if you end up with bigger numbers), and do the reverse, while handling whatever you’ve implemented from the above.

If you do this in a lazy fashion – using yield return to return the numbers as you read them from disk and calculate them, and streaming numbers to disk rather than holding them in memory and returning them at once, you can keep your memory usage down whatever the size of the stored data.

(I think, but I’m not sure, that even the GZipStream and other compression streams will let you stream your data without having it all in memory.)

Querying

If you’re comparing two of your big data sets, I wouldn’t advise using LINQ’s Intersect method as it requires reading one of the sources completely into memory. However, as you know both sequences are increasing, you can write a similar method that needs only hold an enumerator for each sequence.

If you’re querying one of your data sets against a user-input, small list of numbers, you can happily use LINQ’s Intersect method as it is currently implemented, as it only needs the second sequence to be entirely in memory.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need a library which would help me to save and query data in

Leave an answerCancel reply

1 Answer

Storing your data compressed

Handling in code

Querying

Leave an answer
Cancel reply