I need to store some data that follows the simple pattern of mapping an

Question

0

Asked: May 11, 20262026-05-11T09:39:31+00:00 2026-05-11T09:39:31+00:00

I need to store some data that follows the simple pattern of mapping an

0

I need to store some data that follows the simple pattern of mapping an ‘id’ to a full table (with multiple rows) of several columns (i.e. some integer values [u, v, w]). The size of one of these tables would be a couple of KB. Basically what I need is to store a persistent cache of some intermediary results.

This could quite easily be implemented as simple sql, but there’s a couple of problems, namely I need to compress the size of this structure on disk as much as possible. (because of amount of values I’m storing) Also, it’s not transactional, I just need to write once and simply read the contents of the entire table, so a relational DB isn’t actually a very good fit.

I was wondering if anyone had any good suggestions? For some reason I can’t seem to come up with something decent atm. Especially something with an API in java would be nice.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T09:39:31+00:00

This sounds like a job for…. new ObjectOutputStream(new FileOutputStream(STORAGE_DIR + '/' + key + '.dat'); !!

Seriously – the simplest method is to just create a file for each data table that you want to store, serialize the data into and look it up using the key as the filename when you want to read.

On a decent file system writes can be made atomic (by writing to a temp file and then renaming the file); read/write speed is measured in 10s of MBit/second; look ups can be made very efficient by creating a simple directory tree like STORAGE_DIR + '/' + key.substring(0,2) + '/' + key.substring(0,4) + '/' + key which should be still efficient with millions of entries and even more efficient if your file system uses indexed directories; lastly its trivial to implement a memory-backed LRU cache on top of this for even faster retrievals.

Regarding compression – you can use Jakarta’s commons-compress to affect a gzip or even bzip2 compression to the data before you store it. But this is an optimization problem and depending on your application and available disk space you may be better off investing the CPU cycles elsewhere.

Here is a sample implementation that I made: http://geek.co.il/articles/geek-storage.zip. It uses a simple interface (which is far from being clean – its just a demonstration of the concept) that offers methods for storing and retrieving objects from a cache with a set maximum size. A cache miss is transfered to a user implementation for handling, and the cache will periodically check that it doesn’t exceed the storage requirements and will remove old data.

I also included a MySQL backed implementation for completion and a benchmark to compare the disk based and MySQL based implementations. On my home machine (an old Athlon 64) the disk benchmark scores better then twice as fast as the MySQL implementation in the enclosed benchmark (9.01 seconds vs. 18.17 seconds). Even though the DB implementation can probably tweaked for slightly better performance, I believe it demonstrates the problem well enough.

Feel free to use this as you see fit.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to store some data that follows the simple pattern of mapping an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply