I have some code reading in and doing work on dozens of input sources. The inputs are, for now, are mostly CSV with a few special fields, but the full structure and contents of the CSV file depends on input source. It’s been suggested that each input type will occasionally need special processing to ‘fix’ some known issue with the input. However, what needs to be done, if anything, will vary depending on the unique issues with each input source, there is no commonality between the error detection that needs to be done for each input source. My code will eventually want to be able to do these sorts of error corrections, and if I’m given a new issue I should be able to have my code start correcting the new issue as well.
Obviously this isn’t easy to address since I don’t know exactly what logic will need to be run until it’s handed to me. My manager (less of a developer) implied that he would hardcode in each unique piece of logic as it’s received, but obviously I don’t like this option. However, if I want to avoid massive amount of obscure hard coded logic I need some sort of configuration method which allows a developer to define some sort of generic concept for testing inputs and possible modifying/correcting known bad fields.
In addition My program will be running continuously on streaming data. Ideally I would have a way to add a new form of error correction to the code without having to stop/restart it.
So what is the best way of allowing my code to have such generic logic configurable for each source? The options that come to mind are having some sort of ‘language’ in my configuration file which will allow the developer to say thing like “if field3 > field2 discard” which seems as if it would be hard to develop and still wouldn’t automatically update unless I have my code every now and then go back and check config file for changes. Or possible have a shared object that the contains these types of checks which a developer can add in a new error check function to each time someone provides us with a new issue that needs addressed; but this is still barely one step above hard coding the logic.
Any suggestions on the best design approach for this? Any existing libraries which might do part of the work for me? I’m working with pure C++ (not c++11).
EDIT: Thank you all for your input, but I failed to mention one major detail. This is all happening during streaming processing with heavy loads. So I need to ensure whatever method I choose doesn’t have too large an overhead to run. I’ll look into trying to incorprate a scripting langauge, but I’m worried that it may ‘cost’ too much.
I ended up deciding to use a plugin archtecture. Each input gets it’s own plugin loaded at runtime. This is fast, keeps the code pretty well encapsulated, and I can detect new plugins and hotswap in the new code while the program is running.