Consider the following code:
struct TrainingExample
{
array<double, N> input;
array<double, M> output;
};
struct Predictor
{
Predictor(const vector<TrainingExample>& trainingSet);
array<double, M> predict(const array<double, N>& input);
}
The class is used as follows:
- Model some easily measured characteristics of an entity type to an array of N input doubles.
- Model some harder to measure characteristics of an entity type to M output doubles.
- Sample the universe of entities, measuring both input and output.
- This data is then passed to the constructor of Predictor as trainingSet which then “studys” it.
- Measure input of a subject entity and pass it to the predict function
- Predict will return a guess at the output based on training examples.
My question is, assume this class had to be reused by many different problems/models without modifying the code for each specific problem – which of the machine learning algorithms would be best to implement such a general-purpose Predictor? (If there is no clear best one in your opinion, than what are some of the popular competing algorithms and how do you select between them?)
Well, without the general knowledge of the problem it is almost impossible to answer your question. You basically specified the process of machine learning: Take input, study it, and generate some parameters of the model and then predict results for validation set. it is the insight you provide based on the problem itself as to which algo to use.
Usually neural nets generate good results in many different domains (that would be gradient decent learning rule algo). In many cases bayesian models perform really well, case based reasoning is often used for discrete, recurring inputs etc. It is up to you to choose one based on the definition of your problem