I have 7000 data instances.
I have those instances manually scored by a human (The reference).
I have different Engines to determine the data’s score automatically.
I have an excel sheet who’s each column describes a certain engine’s score and one column of the manually scored data.
I want to know which of the engines is the closer to the human’s scoring using either Excel’s functions , programming, or just give me the simple maths of it and I’ll work it out.
Data scoring is from -3.0 to +3.0
I use C# for that application and .NET Excel COM libraries to access the excel sheet.
-UPDATE-
Statistically speaking, what’s the best way to describe the error, I mean the human score tend to be close to neutral (0) , but the Engines’ scores tends to be biased (above 1.5 +/-) I want to be able to determine the best equation to describe and exaggerate the error in a right way.
I would suggest using a mean squared error. For each data instance calculate the square of the difference for each engine. This will exaggerate the error, and give positive numbers. Then you take the average squared error for each engine. The lowest would be the “closest” estimator to the human.