Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3351776
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T01:56:58+00:00 2026-05-18T01:56:58+00:00

I need a memory-efficient data structure for for storing about a million key–value pairs,

  • 0

I need a memory-efficient data structure for for storing about a million key–value pairs, where keys are strings of about 80 bytes, and values are strings of about 200 bytes, the total key and value size being about 280MB. I also need efficient lookup of value by key, preferably a hash-map. The memory overhead should be as little as possible, e.g. for 280MB of useful data, the data structure shouldn’t use more than 300MB of virtual memory (including malloc() overhead and everything else). The usage pattern is the following: we start with an empty data structure, and we populate it gradually, never changing keys, and never changing the length of values. As a plus, the data structure may support changing the length of values, at the expense of a 100% value overhead (meaning that for x value bytes, x bytes might be wasted in temporarily in unused buffer space).

I need a pure Python module, or a built-in Python module, or a C implementation preferably with (C)Python bindings. I’d prefer if it was possible to serialize the whole data structure to disk, and to read it back very quickly.

Just to prove that such a small overhead is possible, I’ve created a simple design with open addressing, the hash table of 1.25 million elements containing 4-byte pointers to 1MB data blocks, the data blocks containing the key and value lengths as base-128 varints. This design has an important limitation: it doesn’t allow removing or changing pairs without wasting their memory area. According to my calculations with 1 million key–value pairs of 280 bytes each, the overhead is less than 3.6% (10 080 000 bytes). The limits above are more generous, they allow 20 000 000 bytes of overhead.

I’ve just found http://www.pytables.org/ , which provides fast access and memory-efficient packing of data. I have to examine it more closely to check if it suits my needs.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T01:56:59+00:00Added an answer on May 18, 2026 at 1:56 am

    Since I couldn’t find any existing solutions which will pack the memory tightly, I’ve decided to implement it in C for myself. See my design with open addressing in the question.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to implement an in-memory tuple-of-strings matching feature in C. There will be
I can never remember the number. I need a memory rule.
I need to do some memory profiling of a .NET Framework application. Should I
I need to configure Tomcat memory settings as part of a larger installation, so
I need to know when the memory will be allocated for a particular program.
I need to know how much physical memory a windows machine has, using Perl.
Background: I need to reserve an amount of memory below 0xA0000 prior to my
I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism in Java.
I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism. In the
Suppose you have a collection of a few hundred in-memory objects and you need

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.