I have a vector of Key-Value pairs, where each Key-Value pair is also tagged with an Entry Type code. The possible Entry Type codes are:
enum Type
{
tData = 0,
tSeqBegin = 1, // the beginning of a sequence
tSeqEnd = 2 // the end of a sequence
};
So the Key-Value pair itself looks like this:
struct KeyVal
{
int key_;
string val_;
Type type_;
};
Within the vector are sub-arrays of additional Key-Value pairs. These sub-arrays are called ‘sequences’. Sequences can be nested to any level. So sequences can themselves have (optional) sub-sequences of varying lengths. The combination of a Key and Type is unique within a sequence element. That is, within a single sequence element there can only be one 269 data row, but other sequence elements can have their own 269 data rows.
Here is a graphical representation of some sample data, grossly oversimplified (If the ‘Type’ column is blank, it is of type tData):
Row# Type Key Value
---- ------------- ----- --------
1 35 "W"
2 1181 "IBM"
3 tSeqBegin 268 "3"
4 269 "0"
5 270 "160.3"
6 tSeqEnd 0
7 269 "0"
8 290 "0"
9 tSeqBegin 453 "1" <-- subsequence
10 tSeqEnd 0 <-- end of subsequence
11 tSeqEnd 0
12 269 "0"
13 290 "1"
14 270 "160.4"
15 tSeqEnd 0
16 1759 "ABC"
[EDIT: A note on the above. There is one tSeqBegin that marks the beginning of the whole sequence. The end of each sequence element is marked by a tSeqEnd. But there is no special tSeqEnd that also marks the end of the whole sequence. So for a sequence you will see 1 tSeqBegin and n tSeqEnds, where n is the number of elements within the sequence.
Another note, in the above sequence beginning at row #3 and ending at row #15, there is one subsequence in the 2nd element (rows 7-11). The subsequence is empty, and occupies rows 9 and 10.]
What I’m trying to do is find a sequence element which has multiple Key-Value matches to certain criteria. For example, suppose I want to find the sequence element that has both 269="0" and 290="0". In this case, it should not find element #0 (starting at row 3) because that element doesn’t have a 290=... row at all. It should find the element starting at row #7 instead. Ultimately I will extract other fields from this element, but that’s beyond the scope of this problem, so I haven’t included that data above.
I can’t use std::find_if() because find_if() will evaluate each row individually, not the whole sequence element as a unit. So I can’t construct a functor that evaluates something like if 269=="0" &&* 290=="0" because no single row will ever evaluate this to true.
I had thought to implement my own find_sequence_element(...) function. But this would involve some fairly complex logic. First I would have to identify the begin() and end() of the entire sequence, noting where each element begin()‘s and end()‘s. Then I would have to construct some kind of evaluation structure that I could string together like this psudocode:
Condition cond = KeyValueMatch(269, "0") + KeyValueMatch(290, "0");
But this is also complex. I can’t just construct a find_sequence_element() that takes exactly 2 parameters, one for the 269 match and another for the 290 match, because I want to use this algorithm for other sequences as well, with more or fewer conditions.
Moreover, it seems like I should be able to use the STL <algorithm>‘s that already exist. While I know the STL rather well, I can’t figure out a way to use find_if() in any straightforward way.
So, finally, here’s the question. If you were faced with the above problem, how would you solve it? I know the question is vague. I’m hoping that with some discussion we can narrow the problem domain down until we have an answer.
Some conditions:
-
I cannot change the single flat
vectorto avector of vectorsor anything of the like. The reasons for this are complex. -
(Placeholder for more conditions 🙂 )
(If consensus is that this should be CW, I will mark it as such)
Hoping I understand your setup correctly, I would proceed as a two-step fashion, nesting search algorithms along the lines of:
except that
Prhere is a predicate that takes a sequence and returns if that sequence matches, yes or no. An example for a single match could be:Where
item_predicate()is suitable to find the(key_,value_)pair in[begin,end).If you’re interested in finding a sequence with two pairs, write a
HasPairspredicate that invokesstd::find_iftwice, or some more optimized version of a search for two elements.