Summary
I have been looking at historical Australian Rules outcomes using excel with an eye on Betfair odds to see if is an opportunity to better predict future match outcomes. My progress to-date is covered in more detail under Background below.
I’d now like to go a step further and look at data mining / pattern matching / algorithim techniques that I could possibly implement. I have had some experience with using dynamic models (Extend) and using Solver in Excel for optimisation but I am unfamiliar with data mining other than the term itself
Are there viable data mining programming techniques available to me to deploy for this analysis in VBA?
(I realise this question may be seen as borderline by some but I think Stack Overflow is better suited to this question than say Math – I am keen to understand potential programming options/algorithims that I can apply in VBA)
My strong preference is to look at this with VBA \ VBscript as this is my coding background but I am open to other options if they are significantly better.
Background
I have extracted the data for Australian Rules football over the last few years into Excel. This data gives me:
- Quarter by Quarter results
(for example WWWL means team 1 leads for the first three quartes before losing the game, DLLL means teams 1 and 2 were level at the end of the first quarter, then team 2 lead for the remainder of the match). - The same info is regrouped into Half by Half results
- Home and Away teams (team 1 is home, team 2 away)
- Match Day Stadium
- Month of the year
Which I then match up to other data sets such as
- League ladder by week (completed)
- Whether the stadium is open air or closed (completed)
- Bookmakers odds pre game (to do)
- What happened with the weather conditions for open air stadiums (to do)
And then dice and splice with PivotTables (perhaps PowerPivot) to interrogate this data to look for gaming opportunities, for example:
- Do certain teams team to lead from start to finish (WWWW) more often than others, and do the odds for a “Four Quarter” win (WWWW) pay disproportionately more than this
likelihood would indicate for a vanilla win (so Lay the vanilla win, Back the WWWW) - Looking for marked differentials in Home and Away performances (i.e. does home ground knowledge, or partisan home crowd support lead to more reversals of ¾ times scores)
- Comparing results of open-air versus closed roof stadiums (removing weather impact)
- Does a long distance travel one week impact the next week(s) outcome
- Do certain teams produce certain scoring patterns more often than standard league results
- Is a lower ranked team more likely to lead throughout an entire match than come from behind to beat a higher ranked team
Check out rapid miner, it has a number of built in tools to explore your data. Assuming you are somewhat competent getting around computer tools, also check out Weka which is a machine learning tool. If you annotate your data, you can train algorithms on the data and see which is the most accurate at predicting the winner.
For example
Team A plays Team B, you’d basically have to represent the flow of the game in a csv file, any additional stats as well on the same line, then at the very last tab say which team won. The part where you say which team won is whats used to train on.