Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6788441
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T17:29:11+00:00 2026-05-26T17:29:11+00:00

We have recently compared the respective file sizes of the same tabular data (think

  • 0

We have recently compared the respective file sizes of the same tabular data (think single table, half a dozen of columns, describing a product catalog) serialized with ProtoBuf.NET or with TSV (tab separated data), both files compressed with GZip afterward (default .NET implementation).

I have been surprised to notice that the compressed ProtoBuf.NET version takes a lot more space than the text version (up to 3x more). My pet theory is that ProtoBuf does not respect the byte semantic and consequently mismatches the GZip frequency compression tree; hence a relatively inefficient compression.

Another possibility is that ProtoBuf encodes, in fact, a lot more data (to facilitate schema versioning for example), hence the serialized formats are not strictly comparable information-wise.

Anybody observing the same problem? Is it even worth to compress ProtoBuf?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T17:29:11+00:00Added an answer on May 26, 2026 at 5:29 pm

    There are a number of factors possible here; firstly, note that the protocol buffers wire format uses straight UTF-8 encoding for strings; if you data is dominated by strings, it will ultimately need about the same amount of space as it would for TSV.

    Protocol buffers is also designed to help store structured data i.e. more complex models that the single table scenario. This doesn’t contribute hugely to the size, but start comparing with xml/json etc (which are more similar in terms of capability) and the difference is more obvious.

    Additionally, since protocol buffers is pretty dense (UTF-8 notwithstanding), in some cases compressing it can actually make it bigger – you might want to check if this is the case here.

    In a quick sample for the scenario you present, both formats give roughly the same sizes – there is no massive jump:

    protobuf-net, no compression: 2498720 bytes, write 34ms, read 72ms, chk 50000
    protobuf-net, gzip: 1521215 bytes, write 234ms, read 146ms, chk 50000
    tsv, no compression: 2492591 bytes, write 74ms, read 122ms, chk 50000
    tsv, gzip: 1258500 bytes, write 238ms, read 169ms, chk 50000
    

    the tsv is marginally smaller in this case, but ultimately TSV is indeed a very simple format (with very limited capabilities in terms of structured data), so it is no surprise that it is quick.

    Indeed; if all you are storing is a very simple single table, TSV is not a bad option – however, it is ultimately a very limited format. I can’t reproduce your “much bigger” example.

    In addition to the richer support for structured data (and other features), protobuf places a lot of emphasis on processing performance too. Now, since TSV is pretty simple the edge here won’t be massive (but is noticeable in the above), but again: contrast to xml, json, or the inbuilt BinaryFormatter for a test against formats with similar features and the difference is obvious.


    Example for the numbers above (updated to use BufferedStream):

    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.IO;
    using System.IO.Compression;
    using System.Text;
    using ProtoBuf;
    static class Program
    {
        static void Main()
        {
            RunTest(12345, 1, new StringWriter()); // let everyone JIT etc
            RunTest(12345, 50000, Console.Out); // actual test
            Console.WriteLine("(done)");
            Console.ReadLine();
        }
        static void RunTest(int seed, int count, TextWriter cout)
        {
    
            var data = InventData(seed, count);
    
            byte[] raw;
            Catalog catalog;
            var write = Stopwatch.StartNew();
            using(var ms = new MemoryStream())
            {
                Serializer.Serialize(ms, data);
                raw = ms.ToArray();
            }
            write.Stop();
    
            var read = Stopwatch.StartNew();
            using(var ms = new MemoryStream(raw))
            {
                catalog = Serializer.Deserialize<Catalog>(ms);
            }
            read.Stop();
    
            cout.WriteLine("protobuf-net, no compression: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
            raw = null; catalog = null;
    
            write = Stopwatch.StartNew();
            using (var ms = new MemoryStream())   
            {
                using (var gzip = new GZipStream(ms, CompressionMode.Compress, true))
                using (var bs = new BufferedStream(gzip, 64 * 1024))
                {
                    Serializer.Serialize(bs, data);
                } // need to close gzip to flush it (flush doesn't flush)
                raw = ms.ToArray();
            }
            write.Stop();
    
            read = Stopwatch.StartNew();
            using(var ms = new MemoryStream(raw))
            using(var gzip = new GZipStream(ms, CompressionMode.Decompress, true))
            {
                catalog = Serializer.Deserialize<Catalog>(gzip);
            }
            read.Stop();
    
            cout.WriteLine("protobuf-net, gzip: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
            raw = null; catalog = null;
    
            write = Stopwatch.StartNew();
            using (var ms = new MemoryStream())
            {
                using (var writer = new StreamWriter(ms))
                {
                    WriteTsv(data, writer);
                }
                raw = ms.ToArray();
            }
            write.Stop();
    
            read = Stopwatch.StartNew();
            using (var ms = new MemoryStream(raw))
            using (var reader = new StreamReader(ms))
            {
                catalog = ReadTsv(reader);
            }
            read.Stop();
    
            cout.WriteLine("tsv, no compression: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
            raw = null; catalog = null;
    
            write = Stopwatch.StartNew();
            using (var ms = new MemoryStream())
            {
                using (var gzip = new GZipStream(ms, CompressionMode.Compress))
                using(var bs = new BufferedStream(gzip, 64 * 1024))
                using(var writer = new StreamWriter(bs))
                {
                    WriteTsv(data, writer);
                }
                raw = ms.ToArray();
            }
            write.Stop();
    
            read = Stopwatch.StartNew();
            using(var ms = new MemoryStream(raw))
            using(var gzip = new GZipStream(ms, CompressionMode.Decompress, true))
            using(var reader = new StreamReader(gzip))
            {
                catalog = ReadTsv(reader);
            }
            read.Stop();
    
            cout.WriteLine("tsv, gzip: {0} bytes, write {1}ms, read {2}ms, chk {3}", raw.Length, write.ElapsedMilliseconds, read.ElapsedMilliseconds, catalog.Products.Count);
        }
    
        private static Catalog ReadTsv(StreamReader reader)
        {
            string line;
            List<Product> list = new List<Product>();
            while((line = reader.ReadLine()) != null)
            {
                string[] parts = line.Split('\t');
                var row = new Product();
                row.Id = int.Parse(parts[0]);
                row.Name = parts[1];
                row.QuantityAvailable = int.Parse(parts[2]);
                row.Price = decimal.Parse(parts[3]);
                row.Weight = int.Parse(parts[4]);
                row.Sku = parts[5];
                list.Add(row);
            }
            return new Catalog {Products = list};
        }
        private static void WriteTsv(Catalog catalog, StreamWriter writer)
        {
            foreach (var row in catalog.Products)
            {
                writer.Write(row.Id);
                writer.Write('\t');
                writer.Write(row.Name);
                writer.Write('\t');
                writer.Write(row.QuantityAvailable);
                writer.Write('\t');
                writer.Write(row.Price);
                writer.Write('\t');
                writer.Write(row.Weight);
                writer.Write('\t');
                writer.Write(row.Sku);
                writer.WriteLine();
            }
        }
        static Catalog InventData(int seed, int count)
        {
            string[] lipsum =
                @"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
                    .Split(' ');
            char[] skuChars = "0123456789abcdef".ToCharArray();
            Random rand = new Random(seed);
            var list = new List<Product>(count);
            int id = 0;
            for (int i = 0; i < count; i++)
            {
                var row = new Product();
                row.Id = id++;
                var name = new StringBuilder(lipsum[rand.Next(lipsum.Length)]);
                int wordCount = rand.Next(0,5);
                for (int j = 0; j < wordCount; j++)
                {
                    name.Append(' ').Append(lipsum[rand.Next(lipsum.Length)]);
                }
                row.Name = name.ToString();
                row.QuantityAvailable = rand.Next(1000);
                row.Price = rand.Next(10000)/100M;
                row.Weight = rand.Next(100);
                char[] sku = new char[10];
                for(int j = 0 ; j < sku.Length ; j++)
                    sku[j] = skuChars[rand.Next(skuChars.Length)];
                row.Sku = new string(sku);
                list.Add(row);
            }
            return new Catalog {Products = list};
        }
    }
    [ProtoContract]
    public class Catalog
    {
        [ProtoMember(1, DataFormat = DataFormat.Group)]
        public List<Product> Products { get; set; } 
    }
    [ProtoContract]
    public class Product
    {
        [ProtoMember(1)]
        public int Id { get; set; }
        [ProtoMember(2)]
        public string Name { get; set; }
        [ProtoMember(3)]
        public int QuantityAvailable { get; set;}
        [ProtoMember(4)]
        public decimal Price { get; set; }
        [ProtoMember(5)]
        public int Weight { get; set; }
        [ProtoMember(6)]
        public string Sku { get; set; }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need more than the default diff! I have recently purchased Beyond Compare and
Have recently been given a project to complete which uses XML quite extensively.Am looking
I have recently started having problems with TortoiseCVS, or more specifically with plink, the
I have recently installed .net 3.5 SP1. When I deployed a compiled web site
I have recently upgraded some of my web applications to ASP.NET 3.5 by installing
I have recently written an application(vb.net) that stores and allows searching for old council
I have recently started using Vim as my text editor and am currently working
I have recently been doing a bit of investigation into the different types of
I have recently run across these terms few times but I am quite confused
We have recently migrated a large, high demand web application to Tomcat 5.5 from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.