Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8529047
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T08:56:27+00:00 2026-06-11T08:56:27+00:00

I have a table with sorted numbers like: 1 320102 2 5200100 3 92010023

  • 0

I have a table with sorted numbers like:

1 320102
2 5200100
3 92010023
4 112010202
5 332020201
6 332020411
: 
5000000000 3833240522044511
5000000001 3833240522089999
5000000002 4000000000213312

Given the record number I need the value in O(log n) time. The record number is 64-bit long and there are no missing record numbers. The values are 64-bit long, they are sorted and value(n) < value(n+1).

The obvious solution is simply doing an array and use the records number as index. This will cost 64-bit per value.

But I would like a more space efficient way of doing that. Since we know the values are always increasing that should be doable, but I do not remember a data structure that lets me do that.

A solution would be to use deflate on the array, but that will not give me O(log n) for accessing an element – thus unacceptable.

Do you know of a data structure that will give me:

  • O(log n) for access
  • space requirement < 64-bit/value

= Edit =

Since we know all numbers in advance we could find the difference between each number. By taking the 99th percentile of these differences we will get a relatively modest number. Taking the log2 will give us the number of bits needed to represent modest number – let us call that modest-bits.

Then create this:

64-bit value of record 0
64-bit value of record 1024
64-bit value of record 2048
64-bit value of record 3072
64-bit value of record 4096

Then a delta table for all records:

modest-bits difference to record 0
modest-bits difference to previous record
1022 * modest-bits difference to previous record
modest-bits difference to record 1024

modest-bits difference to record k*1024 will always be 0, so we can use that for signaling. If it is non-zero, then the following 64-bit will be a pointer to a simple array for the next 1024 records as 64-bit values.

As the modest value is chosen as the 99th percentile number, that will at most happen 1% of the time, thus wasting at most 1% * n * modest-bits + 1% * n * 64-bit * 1024.

space: O(modest-bits * n + 64-bit * n / 1024 + 1% * n * modest-bits + 1% * n * 64-bit * 1024)

lookup: O(1 + 1024)

(99% and 1024 may have to be adjusted)

= Edit2 =

Based on the idea above, but wasting less space. Create this:

64-bit value of record 0
64-bit value of record 1024
64-bit value of record 2048
64-bit value of record 3072
64-bit value of record 4096

And for all value that cannot be represented by modest-bits create big-value table as a tree:

64-bit position, 64-bit value
64-bit position, 64-bit value
64-bit position, 64-bit value

Then a delta table for all records, that is reset for every 1024 records:

modest-bits difference to record 0
modest-bits difference to previous record
1022 * modest-bits difference to previous record
modest-bits difference to record 1024

but also reset for every value that is in the big-value table.

space: O(modest-bits * n + 64-bit * n / 1024 + 1% * n * 2 * 64-bit).

Lookup requires searching big-value table, then looking up the 1024’th value and finally summing up the modest-bits values.

lookup: O(log(big-value table) + 1 + 1024) = O(log n)

Can you improve this? Or do better in a different way?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T08:56:28+00:00Added an answer on June 11, 2026 at 8:56 am

    OP proposes splitting numbers into blocks (only once). But this process may be continued. Split every block once more. And again… Finally we might get a binary trie.

    enter image description here

    Root node contains value of the number with least index. Its right descendant stores difference between the middle number in the table and the number with least index: d = A[N/2] - A[0] - N/2. This is continued for other right descendants (red nodes on diagram). Leaf nodes contain deltas from preceding numbers: d = A[i+1] - A[i] - 1.

    So most of the values, stored in trie, are delta values. Each of them occupies less than 64 bits. And for compactness they may be stored as variable-bit-length numbers in a bit stream. To get length of each number and to navigate in this structure in O(log N) time, bit stream should also contain lengths of (some) numbers and (some) subtrees:

    1. Each node contains length (in bits) of its left sub-tree (if it has one).
    2. Each right descendant (red nodes on diagram), except leaf nodes, contains length (in bits) of its value. Leaf node’s length may be calculated from other lengths on the path from root to this node.
    3. Each right descendant (red nodes on diagram) contains difference of correspondent value and the value of nearest “red” node up the path.
    4. All nodes are packed in bit stream, starting from root node, in-order: left descendant always follows its ancestor; right descendant follows sub-tree, rooted by left descendant.

    To access element given its index, use index’s binary representation to follow path in the trie. While traversing this path, add together all values of “red” nodes. Stop when no more non-zero bits are left in the index.

    There are several options to store N/2 value lengths:

    1. Allocate as many bits for each length as needed to represent all values from the largest length to somewhere below mean length (excluding some very short outliers).
    2. Also exclude some long outliers (keep them in a separate map).
    3. Since lengths may be not evenly distributed, it’s reasonable to use Huffman encoding for value lengths.

    Either fixed length or Huffman encodings should be different for each trie depth.

    N/4 subtree lengths are, in fact, value lengths, because N/4 smallest subtrees contain a single value.

    Other N/4 subtree lengths may be stored in words of fixed (predefined) length, so that for large subtrees we know only approximate (rounded up) lengths.

    For 230 full-range 64-bit numbers we have to pack approximately 34-bit values, for 3/4 nodes, approx. 4-bit value lengths, and for every fourth node, 10-bit subtree lengths. Which saves 34% space.


    Example values:

    0 320102
    1 5200100
    2 92010023
    3 112010202
    4 332020201
    5 332020411
    6 3833240522044511
    7 3833240522089999
    8 4000000000213312
    

    Trie for these values:

    root               d=320102           vl=19    tl=84+8+105+4+5=206
       +-l                                         tl=75+4+5=84
       | +-l                                       tl=23
       | | +-l
       | | | +-r       d=4879997          (vl=23)
       | | +-r         d=91689919         vl=27
       | |   +-r       d=20000178         (vl=25)
       | +-r           d=331700095        vl=29    tl=8
       |   +-l
       |   | +-r       d=209              (vl=8)
       |   +-r         d=3833240190024308 vl=52
       |     +-r       d=45487            (vl=16)
       +-r             d=3999999999893202 vl=52
    

    Value length encoding:

               bits start end
    Root       0    19    19
    depth 1    0    52    52
    depth 2    0    29    29
    depth 3    5    27    52
    depth 4    4    8     23
    

    Sub-tree lengths need 8 bits each.

    Here is encoded stream (binary values still shown in decimal for readability):

    bits value                      comment
    19   320102                     root value
    8    206                        left subtree length of the root
    8    84                         left subtree length
    4    15                         smallest left subtree length (with base value 8)
    23   4879997                    value for index 1
    5    0                          value length for index 2 (with base value 27)
    27   91689919                   value for index 2
    25   20000178                   value for index 3
    29   331700095                  value for index 4
    4    0                          smallest left subtree length (with base value 8)
    8    209                        value for index 5
    5    25                         value length for index 6 (with base value 27)
    52   3833240190024308           value for index 6
    16   45487                      value for index 7
    52   3999999999893202           value for index 8
    

    Altogether 285 bits or 5 64-bit words. We also need to store bits/start values from value length encoding table (350 bits). To store 635 bits we need 10 64-bit words, which means such a small number table cannot be compressed. For larger number tables, size of value length encoding table is negligible.

    To search a value for index 7, read root value (320102), skip 206 bits, add value for index 4 (331700095), skip 8 bits, add value for index 6 (3833240190024308), add value for index 7 (45487), and add index (7). The result is 3 833 240 522 089 999, as expected.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table that has data like below(sorted by date) COL_A | COL_B
I have a MySQL table like this: Payments +----+---------------------+---------+ | id | date (sorted
I have a table with a SortID column that is numbered using consecutive numbers.
Is a table intrinsically sorted by it's primary key? If I have a table
I have table of data that is sorted as follows: Item | Sample |
I have a table of data that is being sorted alphabetically, and it's a
I have a table with the following (I've sorted it by date): ID prodId
I have a table which looks like Col1 col2 col3 col4 col5 1 5
I have a table I'd like to do paging and ordering on and was
I have a table of data that looks like the following: ArtistName TrackName TrackID

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.