I’m trying to refactor some C++ code due to performance problems, and I’m wondering the best way to solve this. I have a class say DataGatherer, which a core component of a large system. This class is serialized and sent over data streams, copied and stored into tables, copied for the sake of concurrency (checking out a copy rather than having access to the original DataGatherer object). I list examples just to let you know it has copy and assignment operators that are both used.
The problem is that the DataGatherer objects can become extremely large because they contain essentially a large collection of gathered data as well as statistics and metadata about the data. In many cases, it’s only the statistics and metadata that are required, and not the backend data collection.
Are there any design patterns that might be helpful here? Maybe the more general question is, what do you do when in most cases you only need part of an object, but the object is so tightly coupled internally that splitting it up is next to impossible?
Ideas I’ve had,
-
Split the class into a DataGatherer class and a pointer to a DataGathererBackend object. Return shallow copies in most cases to avoid all the overhead of copying DataGathererBackend around when it isn’t used. I hate to do this because of all the hassle of dealing with when you want a shallow copy vs when you want a deep copy, and just general messyness form having to resort to DataGatherer objects that have NULL pointers for backend objects because it wasn’t needed in those cases.
-
Split the class into DataStatistics and Data, and make a 3rd class that wraps and contains both.
-
Other ideas?
If just statistics are wanted, how about a struct of statistics you can return by value or const reference and change the retrieval from copy operators to calling a statistics getter function.