Can we design a predictive model capable of accurately predicting if the home team will win a football match?
Sports betting is a 500 billion dollar market (Sydney Herald)
Kaggle hosts a yearly competiton called March Madness
Several Papers on this
"It is possible to predict the winner of English county twenty twenty cricket games in almost two thirds of instances."
"Something that becomes clear from the results is that Twitter contains enough information to be useful for predicting outcomes in the Premier League"
For the 2014 World Cup, Bing correctly predicted the outcomes for all of the 15 games in the knockout round.
So the right questions to ask are
-What model should we use? -What are the features (the aspects of a game) that matter the most to predicting a team win? Does being the home team give a team the advantage?
(https://image.slidesharecdn.com/logisticregression-predictingthechancesofcoronaryheartdisease-091203130638-phpapp01/95/logistic-regression-predicting-the-chances-of-coronary-heart-disease-2-728.jpg?cb=1259845609"Logo Title Text 1")
Support Vector Machine
Clearly XGBoost seems like the best model as it has the highest F1 score and accuracy score on the test set.
-Adding Sentiment from Twitter, News Articles -More features from other data sources (how much did others bet, player specific health stats)