Benchmarking shows that the cereal library takes 100x longer to deserialize a data structure

Question

0

Asked: June 5, 20262026-06-05T05:44:24+00:00 2026-06-05T05:44:24+00:00

Benchmarking shows that the cereal library takes 100x longer to deserialize a data structure

0

Benchmarking shows that the cereal library takes 100x longer to deserialize a data structure of mine (detailed below) than it takes to read the same data off the drive:

benchmarking Read
mean: 465.7050 us, lb 460.9873 us, ub 471.0938 us, ci 0.950
std dev: 25.79706 us, lb 22.19820 us, ub 30.81870 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
  4 (4.0%) high mild
variance introduced by outliers: 53.460%
variance is severely inflated by outliers

benchmarking Read + Decode
collecting 100 samples, 1 iterations each, in estimated 6.356502 s
mean: 68.85135 ms, lb 67.65992 ms, ub 70.05832 ms, ci 0.950
std dev: 6.134430 ms, lb 5.607914 ms, ub 6.755639 ms, ci 0.950
variance introduced by outliers: 74.863%
variance is severely inflated by outliers

This is also supported by profiling typical deserialization usage of this data structure in a program of mine where 98% of the time is spent deserializing the data and 1% is IO plus the core algorithm:

COST CENTRE                    MODULE               %time %alloc

getWord8                       Data.Serialize.Get    30.5   40.4
unGet                          Data.Serialize.Get    29.5   17.9
getWord64be                    Data.Serialize.Get    14.0   10.7
getListOf                      Data.Serialize.Get    10.2   12.8
roll                           Data.Serialize         8.2   11.5
shiftl_w64                     Data.Serialize.Get     3.4    2.9
decode                         Data.Serialize         2.9    3.1
main                           Main                   1.3    0.6

The data structure I’m deserializing is an IntMap [Triplet Atom] and the definitions of the component types are given below:

type Triplet a = (a, a, a)

data Point = Point {
    _x :: {-# UNPACK #-} !Double ,
    _y :: {-# UNPACK #-} !Double ,
    _z :: {-# UNPACK #-} !Double }

data Atom = Atom {
    _serial :: {-# UNPACK #-} !Int    ,
    _r      :: {-# UNPACK #-} !Point  ,
    _n      :: {-# UNPACK #-} !Word64 }

I’m using the default IntMap, (,,) and [] instances provided by cereal, and the following types and instances for my custom types:

instance Serialize Point where
    put (Point x y z) = do
        put x
        put y
        put z
    get = Point <$> get <*> get <*> get

instance Serialize Atom where
    put (Atom s r n) = do
        put s
        put r
        put n
    get = Atom <$> get <*> get <*> get

So my questions are:

Why is deserialization so slow in general?
Is there any way to change my data structure (i.e. IntMap/[]) to make the deserialization go faster?
Is there any way to change my data types (i.e. Atom/Point) to make deserialization go faster?
Are there faster alternatives than cereal within Haskell, or should I store the data structure in C-land to do more rapid deserialization (i.e. use mmap)?

These files I am deserializing are being used for sub-indices for a search engine since the full index cannot fit in memory for the target computer (which is a consumer-grade desktop), so I store each sub-index on disk and read+decode the sub-indices pointed to by the initial global index residing in memory. Also, I’m not concerned about serialization speed since searching the index is the bottle-neck for the end user and the current serialization performance of cereal is satisfactory for generating and updating the index.

Edit:

Tried out Don’s suggestion of using a space-efficient triplet, and this quadrupled the speed:

benchmarking Read
mean: 468.9671 us, lb 464.2564 us, ub 473.8867 us, ci 0.950
std dev: 24.67863 us, lb 21.71392 us, ub 28.39479 us, ci 0.950
found 2 outliers among 100 samples (2.0%)
  2 (2.0%) high mild
variance introduced by outliers: 50.474%
variance is severely inflated by outliers

benchmarking Read + Decode
mean: 15.04670 ms, lb 14.99097 ms, ub 15.10520 ms, ci 0.950
std dev: 292.7815 us, lb 278.8742 us, ub 308.1960 us, ci 0.950
variance introduced by outliers: 12.303%
variance is moderately inflated by outliers

However, it still remains the bottleneck using up 25x as much time as IO. Also, can anybody explain why Don’s suggestion works? Does this mean if I switched to something other than a list (like an array?) that it might give an improvement, too?

Edit #2: Just switched to latest Haskell platform and reran profiling for cereal. The information is considerably more detailed and I’ve provided an hpaste of it.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T05:44:25+00:00

Ok. To answer this with the summary of the advice. For fast deserialization of data:

Use cereal (strict bytestring output) or binary (lazy bytetring output)
Make sure you’re compiling with -O2, as these libraries rely on inlining to remove overhead
Use dense data types, such as replacing a polymorphic tuple with an unpacked, specialized form.
Avoid converting data types to lists to serialize them. If you have bytestrings, this is taken care of. For unpacked array types, you usually will get very fast IO, but worth double checking the instances
You may be able to use mmap’d IO
For double-heavy data consider a more efficient double reader.
Use modern array and container types tuned for performance, with more recent GHC versions.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Benchmarking shows that the cereal library takes 100x longer to deserialize a data structure

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply