I am implementing a flexible array in C. All ideas are based upon this small Paper.
My error is with the indexing operation. Been hacking away for too long to admit. Asking for knowledgable minds and eyes.
Overview:
The data structure consists of 1 flat index block which holds pointers to the data blocks. Each data block has size 2^(k/2). Where k is the leading set bit. So when I search for element “i”, k is log_2(i+1).
index = {
[0], -> [0]
[2], -> [0,1]
[2], -> [0,1]
[3], -> [0,1,2,3]
[4], -> [0,1,2,3]
....}
The size of each data block is determined by the “super block” it is clustered into. Where a super block is made up of data blocks with the same size. So. index 1,2 are in the same super block (super block 1). While 0 (super block 0), and index 3 (superblock 2) are not.
You end up with each superblock having 2^(floor(k/2)) data blocks, and each of those data blocks has size 2^(ceil(k/2)).
The Problem
When r = a power of 2 the index skips over what is should be.
Like when I search for 3 it should be index [2][0], instead its index[3][0].
Why does this happen? any way to avoid it? Is there an “off by 1” error I’m not seeing??
The Code
Here is the single main function testcase, it’s clear and simple, and fails when trying to get element at index 3;
The simplified code for the locating index i is this:
/* Edited from the actual test case for extra clarity */
/* all vars int */
r= i+1
k = first_set_bit(r) - 1; // ex: r = 5, 5 = "00000101"b so k is (3-1) = 2
b = first_subset_of_r(floor(k/2)); // floor(k/2) bits of r immediately after first set;
e = last_subset_of_r(ceil(k/2); // last ceil(k/2) bits of r
p = (1 << k) -1 ; // 2^k -1
// Index supposed to be found with. . .
return index[p+b][e];
Here is some real output, of first printing the array’s contents by index and data block, then the output of 4 trys into the index
The first part dumps the 2d array, where the number before the bar is the index of the index block, and the part after is the elements contained in the array that the index block points to. all arrays are zero indexed.
[clemensm@gaia:23]> ./a.out
Index block | element number (Not it's index!)
0 | 0
1 | 1 2
2 | 3 4
3 | 5 6
4 | 7 8 9 10
5 | 11 12 13 14
6 | 15 16 17 18
7 | 19 20 21 22
8 | 23 24 25 26
9 | 27 28 29 30
10 | 31 32 33 34 35 36 37 38
11 | 39 40 41 42 43 44 45 46
12 | 47 48 49 50 51 52 53 54
13 | 55 56 57 58 59 60 61 62
14 | 63 64 65 66 67 68 69 70
15 | 71 72 73 74 75 76 77 78
16 | 79 80 81 82 83 84 85 86
17 | 87 88 89 90 91 92 93 94
18 | 95 96 97 98 99 100 101 102
19 | 103 104 105 106 107 108 109 110
Finished element dump
Trying to get 0
R: [1]b
k/2=[0], Ceil(k,2)=[0]
K: [0] is the leading 1 bit
B: [0]
E: [0]
P: [0] data blocks prior to our superblock
p+b,e : [0,0]
Trying to get 1
R: [10]b
k/2=[0], Ceil(k,2)=[1]
K: [1] is the leading 1 bit
B: [0]
E: [0]
P: [1] data blocks prior to our superblock
p+b,e : [1,0]
Trying to get 2
R: [11]b
k/2=[0], Ceil(k,2)=[1]
K: [1] is the leading 1 bit
B: [0]
E: [1]
P: [1] data blocks prior to our superblock
p+b,e : [1,1]
Trying to get 3
R: [100]b
k/2=[1], Ceil(k,2)=[1]
K: [2] is the leading 1 bit
B: [0]
E: [0]
P: [3] data blocks prior to our superblock
p+b,e : [3,0]
a.out: test_array.c:81: main: Assertion `get_index(3)==3' failed.
Abort (core dumped)
Just for the search engines and clearity, the paper name:
Resizable Arrays in Optimal Time and SpaceAs far as I see it from the paper, you can make a very basic experience here. Even if it is a nice looking paper, there can be mistakes.
The locate algorithm is clearly wrong. Line 3 claiming to calculate the number of data blocks prior to
SB_kcan not work this way. You see this already in your example above.I suggest to find out the formula by yourself and continue reading afterwards.
My analysis suggests this formula:
Sample code:
Sample results:
edit: More details about the way to the formula
How to get the formula (I suggest to read every step and think if you can continue by yourself, if not continue to tread):
The paper says:
"When superblock SB_k is fully allocated, it consists of 2^floor(k/2) data blocks"If we want the number of data blocks prior to SB_k, we have to sum over all of them:
We could use this now, this is a clear for loop statement. But lets make it a single calculation.
If you think about the sum, it sums up every 2^i two times because of the floor. This is true for all even k, because the sum max is uneven then, hence we have
floor((k-2)/2) = floor((k-1)/2).For odd k, we have one single number, we have to care about this separately.
So, we have now:
(make some examples for k=6,7,8 if you need some more details)
Now, we can clearly get rid of the sum, because we know that
sum from i=0 to n over 2^i = 2^(n+1) - 1(this is a basic math/cs proof. You can clearly see the correctness from the binary representation). We use this formula to get:Now, we can multiply by the 2, modify the exponent and we are finished:
I hope, this helps.