I have to implement a set ADT for a pair of strings. The interface

Question

0

Asked: May 26, 20262026-05-26T15:18:44+00:00 2026-05-26T15:18:44+00:00

I have to implement a set ADT for a pair of strings. The interface

0

I have to implement a set ADT for a pair of strings. The interface I want is (in Java):

public interface StringSet {
  void add(String a, String b);
  boolean contains(String a, String b);
  void remove(String a, String b);
}

The data access pattern has the following properties:

The contains operation is far more frequent that the add and remove ones.
More often that not, contains returns true i.e. the search is successful

A simple implementation I can think of is to use a two-level hashtable, i.e. HashMap<String, HashMap<String, Boolean>>. But this datastructure makes no use of the two peculiarities of the access pattern. I am wondering if there is something more efficient than the hashtable, maybe by leveraging the access pattern peculiarities.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:18:45+00:00

Do not use normal trees (most standard library data structures) for this. There is one simple assumption, which will hurt you in this case:

The normal O(log(n)) calculation of operations on trees assume that comparisons are in O(1). This is true for integers and most other keys, but not for strings. In case of strings each comparison is on O(k) where k is the length of the string. This makes all operations dependent on the length, which will most likely hurt you if you need to be fast and is easily overlooked.

Especially if you most often return true there will be k comparisons for each string at each level, so with this access pattern you will experience the full drawback of strings in trees.

Your access pattern is easily handled by a Trie. Testing if a string is contained is in O(k) worst case (not average case as in a hash map). Adding a string is is also in O(k). Since you are storing two strings I would suggest, you don’t index your trie by characters, but rather by some larger type, so you can add two special index values. One value for the end of the first string, and one value for the end of both strings.

In your case using these two extra symbols would also allow for simple removal: Just delete the final node containing the end symbol and your string will not be found anymore. You will waste some memory, because you still have the strings in your structure that have been deleted. In case this is a problem you could keep track of the number of deleted strings and rebuild your trie in case this get’s to bad.

P.s. A trie can be thought of as a combination of a tree and several hashtables, so this gives you the best of both data structures.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to implement a set ADT for a pair of strings. The interface

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply