I want to create a record that would hold the information about
- a) what kind of elements are present and
- b) the number of elements of each kind present
in a node of a tree. I would explicitly store this information only for the leaf nodes, while the information for the parent node can be obtaining through combining the information of all of it’s children (e.g. child 1 has 3 objects of A, 1 object of B, child 2 has 1 object of A, 2 objects of C — parent has 4 objects of A, 1 object of B and 2 of C).
I will be careful when requesting this information from the parent nodes not to first request, use and discard information for a child node and then for its parent node, but the upward construction will be a common operation. Other two common operations are directly derived from what I store: is the object of kind X present? and how many objects of kind X is present? and also how many kinds of objects are present?
Object kinds are represented as integers, and the object numbers are always integer values. What is the better choice (and arguments for the selected choice):
- use
std::multiset<int>, and operate withstd::multiset::count()andstd::multiset::find()operations (easier union but duplication of elements, total distinct element count hard to obtain) - use
std::map<int, std::size_t>with the kind as a key and number of objects as a value (no duplicate elements,std::map::find()function present, size gives the correct number of object kinds stored, but accessing a non-existent element increases the size unintentionally)
Thank you for your suggestions!
To store a total of n items with k distinct values per your comparison predicate, an
std::multisetallocates n binary search tree nodes(*). Anstd::mapallocates only k (slightly larger) nodes.You’d use
std::multisetwhen two items can be considered equal by your comparison predicate, but must still be explicitly stored, because they differ in some aspect that the comparison predicate does not check. Also, iterating over amultisetgenerates each of the n items, whereas amapwould generate each of the k distinct items with the count for each.In the case where the items are just integers, go with
std::map. Your “how many distinct items” query would then just be a call tosize, which runs in constant time.Your claim that “accessing a non-existent element increases the size unintentionally” is only true if you use
operator[]to access nodes.finddoes not exhibit this behavior.(*) The C++ standard does not guarantee that these containers are implemented as (balanced) BSTs, but in all implementations that I’ve seen, they are.