Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 31509
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T13:36:58+00:00 2026-05-10T13:36:58+00:00

I have an information retrieval application that creates bit arrays on the order of

  • 0

I have an information retrieval application that creates bit arrays on the order of 10s of million bits. The number of ‘set’ bits in the array varies widely, from all clear to all set. Currently, I’m using a straight-forward bit array (java.util.BitSet), so each of my bit arrays takes several megabytes.

My plan is to look at the cardinality of the first N bits, then make a decision about what data structure to use for the remainder. Clearly some data structures are better for very sparse bit arrays, and others when roughly half the bits are set (when most bits are set, I can use negation to treat it as a sparse set of zeroes).

  • What structures might be good at each extreme?
  • Are there any in the middle?

Here are a few constraints or hints:

  1. The bits are set only once, and in index order.
  2. I need 100% accuracy, so something like a Bloom filter isn’t good enough.
  3. After the set is built, I need to be able to efficiently iterate over the ‘set’ bits.
  4. The bits are randomly distributed, so run-length–encoding algorithms aren’t likely to be much better than a simple list of bit indexes.
  5. I’m trying to optimize memory utilization, but speed still carries some weight.

Something with an open source Java implementation is helpful, but not strictly necessary. I’m more interested in the fundamentals.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T13:36:59+00:00Added an answer on May 10, 2026 at 1:36 pm

    Unless the data is truly random and has a symmetric 1/0 distribution, then this simply becomes a lossless data compression problem and is very analogous to CCITT Group 3 compression used for black and white (i.e.: Binary) FAX images. CCITT Group 3 uses a Huffman Coding scheme. In the case of FAX they are using a fixed set of Huffman codes, but for a given data set, you can generate a specific set of codes for each data set to improve the compression ratio achieved. As long as you only need to access the bits sequentially, as you implied, this will be a pretty efficient approach. Random access would create some additional challenges, but you could probably generate a binary search tree index to various offset points in the array that would allow you to get close to the desired location and then walk in from there.

    Note: The Huffman scheme still works well even if the data is random, as long as the 1/0 distribution is not perfectly even. That is, the less even the distribution, the better the compression ratio.

    Finally, if the bits are truly random with an even distribution, then, well, according to Mr. Claude Shannon, you are not going to be able to compress it any significant amount using any scheme.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 76k
  • Answers 76k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • added an answer The user agent doesn't change how Safari renders a page,… May 11, 2026 at 2:59 pm
  • added an answer As stated on the Infragistics forum, the XamDataGrid exposes an… May 11, 2026 at 2:59 pm
  • added an answer By using the correct command line parameter :) firefox -install-global-extension… May 11, 2026 at 2:59 pm

Related Questions

I am part of a team creating a web application using PHP and MySQL.
I want to create a product catalog that allows for intricate details on each
I have an application that imports data from a pickled file. It works just
I'm experiencing an error when using the windows installer to install an event source

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.