I have a RuleTree data structure which represents a tree of rules that are used to process incoming data items.
- The RuleTree is currently an immutable data structure containing an arbitrary number of (possibly nested) rules.
- There are multiple threads which will be simultaneously applying the same RuleTree to different input data items
- The RuleTree is applied to input data in one or more phases. It’s up to the calling code to decide which phases to apply.
The typical control flow will be something like:
ruleTree.applyStage1(data);
..
// other stuff happens
..
ruleTree.applyStage2(data);
..
// other stuff happens
..
ruleTree.applyStage3(data);
This currently works fine. However, I now have a requirement to calculate some additional state information during RuleTree processing (e.g. counting the number of matches for a specific rule in the tree). As far as I can see it, I have a few options:
- Make the RuleTree mutable and enable it to store the state information which can be read back later. This will make concurrency trickier however since different threads will need different copies of the RuleTrees.
- Add thread-local state to RuleTrees – so that the state information can be calculated and stored within the RuleTree, but different threads won’t trample upon each other’s state information. However this means that all phases must be guaranteed to run on the same thread.
- Have a separate object for state information that can be passed as an extra parameter to the ruletree,
e.g. ruleTree.applyStage1(data, state). This keeps the RuleTrees nice and immutable, but makes using them more complex for the caller since the calling code now has to setup and manage the state data separately.
Which approach is likely to be best and why?
Use the “separate object for state information” approach, because it does not suffer from the flaws inherent in the other approaches you are considering. What’s more, the chief flaw of the “separate object” model, that it requires the user to pass the state to every method of RuleTree, can easily be dealt with.
Consider a proxy for RuleTree. I’ll use Ruby as a workable approximation for pseudocode:
@ denotes an object member variable.
Anyone needing to use RuleTree creates a RuleTreeProxy and calls it instead:
The state object contains accessors to retrieve useful information about the processing done by RuleTree:
If you require that the different phases can be done in different threads, then either ensure that no two threads will attempt to operate on a RuleTreeProxy instance at the same time, or add appropriate synchronization to RuleTreeProxy.