We have recently compared the respective file sizes of the same tabular data (think

Question

0

Asked: May 26, 20262026-05-26T17:29:11+00:00 2026-05-26T17:29:11+00:00

We have recently compared the respective file sizes of the same tabular data (think

0

We have recently compared the respective file sizes of the same tabular data (think single table, half a dozen of columns, describing a product catalog) serialized with ProtoBuf.NET or with TSV (tab separated data), both files compressed with GZip afterward (default .NET implementation).

I have been surprised to notice that the compressed ProtoBuf.NET version takes a lot more space than the text version (up to 3x more). ~~My pet theory is that ProtoBuf does not respect the byte semantic and consequently mismatches the GZip frequency compression tree; hence a relatively inefficient compression.~~

Another possibility is that ProtoBuf encodes, in fact, a lot more data (to facilitate schema versioning for example), hence the serialized formats are not strictly comparable information-wise.

Anybody observing the same problem? Is it even worth to compress ProtoBuf?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T17:29:11+00:00

There are a number of factors possible here; firstly, note that the protocol buffers wire format uses straight UTF-8 encoding for strings; if you data is dominated by strings, it will ultimately need about the same amount of space as it would for TSV.

Protocol buffers is also designed to help store structured data i.e. more complex models that the single table scenario. This doesn’t contribute hugely to the size, but start comparing with xml/json etc (which are more similar in terms of capability) and the difference is more obvious.

Additionally, since protocol buffers is pretty dense (UTF-8 notwithstanding), in some cases compressing it can actually make it bigger – you might want to check if this is the case here.

In a quick sample for the scenario you present, both formats give roughly the same sizes – there is no massive jump:

protobuf-net, no compression: 2498720 bytes, write 34ms, read 72ms, chk 50000
protobuf-net, gzip: 1521215 bytes, write 234ms, read 146ms, chk 50000
tsv, no compression: 2492591 bytes, write 74ms, read 122ms, chk 50000
tsv, gzip: 1258500 bytes, write 238ms, read 169ms, chk 50000

the tsv is marginally smaller in this case, but ultimately TSV is indeed a very simple format (with very limited capabilities in terms of structured data), so it is no surprise that it is quick.

Indeed; if all you are storing is a very simple single table, TSV is not a bad option – however, it is ultimately a very limited format. I can’t reproduce your “much bigger” example.

In addition to the richer support for structured data (and other features), protobuf places a lot of emphasis on processing performance too. Now, since TSV is pretty simple the edge here won’t be massive (but is noticeable in the above), but again: contrast to xml, json, or the inbuilt BinaryFormatter for a test against formats with similar features and the difference is obvious.

Example for the numbers above (updated to use BufferedStream):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.IO.Compression;
using System.Text;
using ProtoBuf;
static class Program
{
    static void Main()
    {
        RunTest(12345, 1, new StringWriter()); // let everyone JIT etc
        RunTest(12345, 50000, Console.Out); // actual test
        Console.WriteLine("(done)");
        Console.ReadLine();
    }
    static void RunTest(int seed, int count, TextWriter cout)
    {

        var data = InventData(seed, count);

        byte[] raw;
        Catalog catalog;
        var write = Stopwatch.StartNew();
        using(var ms = new MemoryStream())
        {
            Serializer.Serialize(ms, data);
            raw = ms.ToArray();
        }
        write.Stop();

        var read = Stopwatch.StartNew();
        using(var ms = new MemoryStream(raw))
        {
            catalog = Serializer.Deserialize<Catalog>(ms);
        }
        read.Stop();

        cout.WriteLine("protobuf-net, no compression: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
        raw = null; catalog = null;

        write = Stopwatch.StartNew();
        using (var ms = new MemoryStream())   
        {
            using (var gzip = new GZipStream(ms, CompressionMode.Compress, true))
            using (var bs = new BufferedStream(gzip, 64 * 1024))
            {
                Serializer.Serialize(bs, data);
            } // need to close gzip to flush it (flush doesn't flush)
            raw = ms.ToArray();
        }
        write.Stop();

        read = Stopwatch.StartNew();
        using(var ms = new MemoryStream(raw))
        using(var gzip = new GZipStream(ms, CompressionMode.Decompress, true))
        {
            catalog = Serializer.Deserialize<Catalog>(gzip);
        }
        read.Stop();

        cout.WriteLine("protobuf-net, gzip: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
        raw = null; catalog = null;

        write = Stopwatch.StartNew();
        using (var ms = new MemoryStream())
        {
            using (var writer = new StreamWriter(ms))
            {
                WriteTsv(data, writer);
            }
            raw = ms.ToArray();
        }
        write.Stop();

        read = Stopwatch.StartNew();
        using (var ms = new MemoryStream(raw))
        using (var reader = new StreamReader(ms))
        {
            catalog = ReadTsv(reader);
        }
        read.Stop();

        cout.WriteLine("tsv, no compression: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
        raw = null; catalog = null;

        write = Stopwatch.StartNew();
        using (var ms = new MemoryStream())
        {
            using (var gzip = new GZipStream(ms, CompressionMode.Compress))
            using(var bs = new BufferedStream(gzip, 64 * 1024))
            using(var writer = new StreamWriter(bs))
            {
                WriteTsv(data, writer);
            }
            raw = ms.ToArray();
        }
        write.Stop();

        read = Stopwatch.StartNew();
        using(var ms = new MemoryStream(raw))
        using(var gzip = new GZipStream(ms, CompressionMode.Decompress, true))
        using(var reader = new StreamReader(gzip))
        {
            catalog = ReadTsv(reader);
        }
        read.Stop();

        cout.WriteLine("tsv, gzip: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
    }

    private static Catalog ReadTsv(StreamReader reader)
    {
        string line;
        List<Product> list = new List<Product>();
        while((line = reader.ReadLine()) != null)
        {
            string[] parts = line.Split('\t');
            var row = new Product();
            row.Id = int.Parse(parts[0]);
            row.Name = parts[1];
            row.QuantityAvailable = int.Parse(parts[2]);
            row.Price = decimal.Parse(parts[3]);
            row.Weight = int.Parse(parts[4]);
            row.Sku = parts[5];
            list.Add(row);
        }
        return new Catalog {Products = list};
    }
    private static void WriteTsv(Catalog catalog, StreamWriter writer)
    {
        foreach (var row in catalog.Products)
        {
            writer.Write(row.Id);
            writer.Write('\t');
            writer.Write(row.Name);
            writer.Write('\t');
            writer.Write(row.QuantityAvailable);
            writer.Write('\t');
            writer.Write(row.Price);
            writer.Write('\t');
            writer.Write(row.Weight);
            writer.Write('\t');
            writer.Write(row.Sku);
            writer.WriteLine();
        }
    }
    static Catalog InventData(int seed, int count)
    {
        string[] lipsum =
            @"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
                .Split(' ');
        char[] skuChars = "0123456789abcdef".ToCharArray();
        Random rand = new Random(seed);
        var list = new List<Product>(count);
        int id = 0;
        for (int i = 0; i < count; i++)
        {
            var row = new Product();
            row.Id = id++;
            var name = new StringBuilder(lipsum[rand.Next(lipsum.Length)]);
            int wordCount = rand.Next(0,5);
            for (int j = 0; j < wordCount; j++)
            {
                name.Append(' ').Append(lipsum[rand.Next(lipsum.Length)]);
            }
            row.Name = name.ToString();
            row.QuantityAvailable = rand.Next(1000);
            row.Price = rand.Next(10000)/100M;
            row.Weight = rand.Next(100);
            char[] sku = new char[10];
            for(int j = 0 ; j < sku.Length ; j++)
                sku[j] = skuChars[rand.Next(skuChars.Length)];
            row.Sku = new string(sku);
            list.Add(row);
        }
        return new Catalog {Products = list};
    }
}
[ProtoContract]
public class Catalog
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List<Product> Products { get; set; } 
}
[ProtoContract]
public class Product
{
    [ProtoMember(1)]
    public int Id { get; set; }
    [ProtoMember(2)]
    public string Name { get; set; }
    [ProtoMember(3)]
    public int QuantityAvailable { get; set;}
    [ProtoMember(4)]
    public decimal Price { get; set; }
    [ProtoMember(5)]
    public int Weight { get; set; }
    [ProtoMember(6)]
    public string Sku { get; set; }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We have recently compared the respective file sizes of the same tabular data (think

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply