Here is an exercise (3-15) in the book Algorithm Design Manual. Design a data

Question

0

Asked: May 30, 20262026-05-30T22:18:14+00:00 2026-05-30T22:18:14+00:00

Here is an exercise (3-15) in the book Algorithm Design Manual. Design a data

0

Here is an exercise (3-15) in the book “Algorithm Design Manual”.

Design a data structure that allows one to search, insert, and delete an integer X in O(1) time (i.e. , constant time, independent of the total number of integers stored). Assume that 1 ≤ X ≤ n and that there are m + n units of space available, where m is the maximum number of integers that can be in the table at any one time. (Hint: use two arrays A[1..n] and B[1..m].) You are not allowed to initialize either A or B, as that would take O(m) or O(n) operations. This means the arrays are full of random garbage to begin with, so you must be very careful.

I am not really seeking for the answer, because I don’t even understand what this exercise asks.

From the first sentence:

Design a data structure that allows one to search, insert, and delete an integer X in O(1) time

I can easily design a data structure like that. For example:

Because 1 <= X <= n, so I just have an bit vector of n slots, and let X be the index of the array, when insert, e.g., 5, then a[5] = 1; when delete, e.g., 5, then a[5] = 0; when search, e.g.,5, then I can simply return a[5], right?

I know this exercise is harder than I imagine, but what’s the key point of this question?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T22:18:15+00:00

You are basically implementing a multiset with bounded size, both in number of elements (#elements <= m), and valid range for elements (1 <= elementValue <= n).

Search: myCollection.search(x) –> return True if x inside, else False
Insert: myCollection.insert(x) –> add exactly one x to collection
Delete: myCollection.delete(x) –> remove exactly one x from collection

Consider what happens if you try to store 5 twice, e.g.

myCollection.insert(5)
myCollection.insert(5)

That is why you cannot use a bit vector. But it says “units” of space, so the elaboration of your method would be to keep a tally of each element. For example you might have [_,_,_,_,1,_,...] then [_,_,_,_,2,_,...].

Why doesn’t this work however? It seems to work just fine for example if you insert 5 then delete 5… but what happens if you do .search(5) on an uninitialized array? You are specifically told you cannot initialize it, so you have no way to tell if the value you’ll find in that piece of memory e.g. 24753 actually means “there are 24753 instances of 5” or if it’s garbage.

NOTE: You must allow yourself O(1) initialization space, or the problem cannot be solved. (Otherwise a .search() would not be able to distinguish the random garbage in your memory from actual data, because you could always come up with random garbage which looked like actual data.) For example you might consider having a boolean which means “I have begun using my memory” which you initialize to False, and set to True the moment you start writing to your m words of memory.

If you’d like a full solution, you can hover over the grey block to reveal the one I came up with. It’s only a few lines of code, but the proofs are a bit longer:

SPOILER: FULL SOLUTION

Setup:

Use N words as a dispatch table: locationOfCounts[i] is an array of size N, with values in the range location=[0,M]. This is the location where the count of i would be stored, but we can only trust this value if we can prove it is not garbage. >!
(sidenote: This is equivalent to an array of pointers, but an array of pointers exposes you being able to look up garbage, so you’d have to code that implementation with pointer-range checks.)

To find out how many is there are in the collection, you can look up the value counts[loc] from above. We use M words as the counts themselves: counts is an array of size N, with two values per element. The first value is the number this represents, and the second value is the count of that number (in the range [1,m]). For example a value of (5,2) would mean that there are 2 instances of the number 5 stored in the collection.

(M words is enough space for all the counts. Proof: We know there can never be more than M elements, therefore the worst-case is we have M counts of value=1. QED)

(We also choose to only keep track of counts >= 1, otherwise we would not have enough memory.)

Use a number called numberOfCountsStored that IS initialized to 0 but is updated whenever the number of item types changes. For example, this number would be 0 for {}, 1 for {5:[1 times]}, 1 for {5:[2 times]}, and 2 for {5:[2 times],6:[4 times]}.

1 2 3 4 5 6 7 8...

locationOfCounts[<N]: [☠, ☠, ☠, ☠, ☠, 0, 1, ☠, ...]

counts[<M]: [(5,⨯2), (6,⨯4), ☠, ☠, ☠, ☠, ☠, ☠, ☠, ☠..., ☠]

numberOfCountsStored: 2

Below we flush out the details of each operation and prove why it’s correct:

Algorithm:

There are two main ideas: 1) we can never allow ourselves to read memory without verifying that is not garbage first, or if we do we must be able to prove that it was garbage, 2) we need to be able to prove in O(1) time that the piece of counter memory has been initialized, with only O(1) space. To go about this, the O(1) space we use is numberOfItemsStored. Each time we do an operation, we will go back to this number to prove that everything was correct (e.g. see ★ below). The representation invariant is that we will always store counts in counts going from left-to-right, so numberOfItemsStored will always be the maximum index of the array that is valid.

.search(e) — Check locationsOfCounts[e]. We assume for now that the value is properly initialized and can be trusted. We proceed to check counts[loc], but first we check if counts[loc] has been initialized: it’s initialized if 0<=loc<numberOfCountsStored (if not, the data is nonsensical so we return False). After checking that, we look up counts[loc] which gives us a number,count pair. If number!=e, we got here by following randomized garbage (nonsensical), so we return False (again as above)… but if indeed number==e, this proves that the count is correct (★proof: numberOfCountsStored is a witness that this particular counts[loc] is valid, and counts[loc].number is a witness that locationOfCounts[number] is valid, and thus our original lookup was not garbage.), so we would return True.

.insert(e) — Perform the steps in .search(e). If it already exists, we only need to increment the count by 1. However if it doesn’t exist, we must tack on a new entry to the right of the counts subarray. First we increment numberOfCountsStored to reflect the fact that this new count is valid: loc = numberOfCountsStored++. Then we tack on the new entry: counts[loc] = (e,⨯1). Finally we add a reference back to it in our dispatch table so we can look it up quickly locationOfCounts[e] = loc.

.delete(e) — Perform the steps in .search(e). If it doesn’t exist, throw an error. If the count is >= 2, all we need to do is decrement the count by 1. Otherwise the count is 1, and the trick here to ensure the whole numberOfCountsStored–counts[...] invariant (i.e. everything remains stored on the left part of counts) is to perform swaps. If deletion would get rid of the last element, we will have lost a counts pair, leaving a hole in our array: [countPair0, countPair1, _hole_, countPair2, countPair{numberOfItemsStored-1}, ☠, ☠, ☠..., ☠]. We swap this hole with the last countPair, decrement numberOfCountsStored to invalidate the hole, and update locationOfCounts[the_count_record_we_swapped.number] so it now points to the new location of the count record.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is an exercise (3-15) in the book Algorithm Design Manual. Design a data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply