Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8701223
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T02:20:46+00:00 2026-06-13T02:20:46+00:00

My question has 2 sub-questions: Let’s assume a case where every second I receive

  • 0

My question has 2 sub-questions:

  1. Let’s assume a case where every second I receive data which is represented as a set of key/value tuples. Each value is basically a 64bit counter. I need to save it into a database. The number of values is several thousands. Out of those thousands only 1% has actual data, others are null (sparsely populated set). Does it make sense to make a table of few thousand columns? Or just store as “id, timestamp, key, value”?

  2. In case the answer to question 1 is “thousands of columns”, which da from mysql/postgres family should be used?

The read pattern for this case is mostly charting, so select will be a bunch of data based on timestamps. So it is uniform 1/sec writes and occasional reads of all data or data in date/time range.

Bonus question, what pattern can be used to store such data in NoSQL database? For example in MongoDB a collection of stats containing documents with just 1% of the whole set can be used. How would it work with read/map/reduce in that case? How would reading the data compare with mysql/postgres?

Edit: My usecase is very similar to NewRelic service but instead of having lots of small datasets I have much larger datasets (sparsely populated out of even bigger set) but less often (and fewer users)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T02:20:47+00:00Added an answer on June 13, 2026 at 2:20 am

    PostgreSQL stores null columns as a bitmap, however there is a large overhead per each row. Lets calculate the storage efficiency of the two storage schemes:

    Average row length for wide table with thousands of columns:
    23 bytes row header + 1000*1bit + average 2 bytes of alignment + 4 bytes id
       + 8 bytes timestamp + 10*8 bytes values = 242 bytes
    
    Average number of bytes for storing each value separately:
    10 values * (23 bytes row header + 1 byte alignment + 4 bytes id
       + 8 bytes timestamp + 4 bytes key + 8 bytes value) = 480 bytes
    

    So thousand columns is about twice as efficient as splitting it out by key. The crossover point where it would be more efficient to store keys separately is at about 0.45%.

    This approach won’t scale very far however. The maximum number of columns in PostgreSQL is limited to 1600. To extend it further you could split the values vertically into many tables. This will also have some issues querying, because a result set can’t be much larger than 1600 either.

    Another option is to encode the key value pairs into arrays. The structure of the table in this case would be (id serial, ts timestamptz, keys int2[], values int8[]). The storage overhead for the same 1000 attributes, 1% fill factor would be:

    23 bytes row header + 1 byte alignment + 4 bytes id + 8 bytes timestamp
       + 20 bytes array header + 10*2 byte values + 20 bytes array header
       + 10*8 byte values = 176 bytes per entry
    

    However querying singular values requires slightly more infrastructure in this case.

    If even better storage efficiency or flexibility is needed, a custom datatype can be added.

    I know that the large number columns pattern for sensor data is used successfully in many PostgreSQL installations. As for database choice, I may be slightly biased, but I would suggest PostgreSQL, because you’ll have much better tools like arrays, predicate indexes and custom datatypes to rearrange your data storage for more efficiency. Most important thing to keep in mind is to use partitioning from the get go.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a user object represented in JPA which has specific sub-types. Eg, think
This question has two parts. Part 1. Yesterday I had some code which would
Looking at the related questions, I don't think this specific question has been asked,
I have a question about sub-queries and case statements I have two case statements
Now, I have read these questions which may have a relation with this question:
This question has developed off an answer here . My question therefore is what
My question has two parts: Is it possible that, if a segfault occurs after
This question has been asked before but i still don't understand it fully so
this question has probably been asked before, but i'm stuck and i've tried a
This question has baffled myself and my cohorts. In the program I had written

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.