Data alone isn’t enough. You need understanding and discipline to be successful.
I think it takes discipline combined with a solid understanding of the sport you’re trying to analyse.
If you don’t understand the context behind your numbers, no amount of advanced analysis and complicated algorithms is going to help you make sense of them. You need discipline because it is very easy to lock in on a particular theory and only pay attention to the data points that confirm that theory. A good practice is to always set aside a portion of your data before you start analysing. Once you’ve built what you think is your “best” model or theory, you can then test it against this alternative dataset.
Sports data yields random results, and the market is no more accurate than it was 16 years ago.
I’d say the most surprising phenomena in sports data sets is the inherent randomness in most sports outcomes and the lack of repeatability. For example, in the NFL, the occasions when a team wins possession of the ball through a defensive action appear to be largely random, with little correlation from week to week. This fact can lead to potential betting opportunities as the market appears to give more weight to defensive turnovers than what the data indicate.
What’s more, despite all the extra data we now have access to today, the market does not appear to be getting more accurate, at least when it comes to the NFL. From 1989 to 2000, the bookies’ favourite won 66.7% of the time. Between 2001 and 2012, the bookies’ favourite won 66.9% of the time.