In 2017, the BBC’s Game of the Day presented a new statistic in the post-match highlights of Premier League matches. Expected goals, or xG, are designed to tell us how many goals a team should score based on the quality of the chances they create in a match. It is loved by amateur and professional statisticians who want to use data to analyze performance.
The BBC uses xG regularly in Premier League coverage, but this metric was absent from both BBC and ITV coverage at the recent men’s World Cup. A quick look at what xG is and its history of using data to predict football matches can give us some insight into why they decided not to use it.
The expected goal concept originally came from ice hockey but can easily be applied to football. xG is calculated by looking at each shot taken by a team in a match and assigning the probability of scoring to it.
This probability is calculated by looking at shots from similar situations in historical matches and calculating what percentage of them resulted in goals. By adding up the probabilities of all shots taken by a team, we get the expected goals for the whole game.
Consider the Premier League match between Tottenham and Liverpool in November 2022, where Liverpool won 2-1. Liverpool got just 1.18 xG from 13 shots in the match, while Tottenham got 1.21 xG from 14 shots.
In post-match interviews, Tottenham manager Antonio Conte claimed that losing was unlucky given Tottenham’s performances. An xG scoreline of 1.21 versus 1.18 indicates a very even game and seems to support Conte’s view.
However, Liverpool manager Jürgen Klopp claimed that the quality of Mohamed Salah, who scored two goals from three shots with a total xG 0.67, was the difference in this match. This exposes one of xG’s biggest weaknesses. It doesn’t matter who the striker or the goalkeeper is. But is this weakness enough to make the xG unreliable as a source for predicting future games?
football prediction before xG
The obvious piece of data to use when analyzing football is goals. Indeed, this was the only piece of information used in Mark Dixon and Stuart Coles’ 1997 model, which predicted future football matches by assigning each team’s offensive and defensive rating.
Dixon-Coles ratings are calculated using the number of goals scored and conceded in previous matches, taking into account the quality of the opponent. The ratings of two different teams, along with the home advantage increase, can be combined to predict the score of an upcoming match between them.
Given the number of statistics available in football, a model that only uses goals to predict future games may seem extremely simple, but its effectiveness lies in understanding what makes good statistical analysis good: high-quality data and lots of it.
Goals are the highest quality data available in football prediction as they are the only thing that really influences the results. This explains why other traditional measures such as number of shots or percentage of possession are not used in the Dixon-Coles model.
A shot can be a penalty where players hope to score a goal or a speculative effort from afar, but both count equally as shots on goal. Similarly, a team may have a lot of the ball, but not in an area of the court that gives them a chance to score.
A statistical study dating back to 1968 found no link between shots, possession or passing actions, and the results of football matches. This supports the idea that goals are the only factor worth considering.
Why can xG be useful?
Dixon-Coles’ weakness is in the amount of data. 1,071 goals were scored in the 2021/22 Premier League season, which might seem like a lot. However, that’s just 2.82 goals per game. To make up for this lack of per-game information, Dixon and Coles used three years of data to make their predictions, although most teams made wholesale changes to their gameplay and management roster during this period.
Increasing the amount of data on a shorter time scale is where xG data is advantageous over targets alone. Basically, it’s an attempt to find a balance between the quality of goal data and the amount of shot-based data. This is a classic conundrum in statistics known as the bias-variance trade-off.
Take the aforementioned Liverpool v Tottenham match. The three goals scored are the only pieces of information the Dixon-Coles model can extract from this match, whereas an xG-based model takes information from all 27 shots taken; would result in a goal. However, not considering who is involved in a shot puts a limit on the quality of this xG data.
Despite being 25 years old, the Dixon-Coles model is still the gold standard in football prediction, as seen in this 2022 study. While xG provides good insights into game balance in a single match, no xG model has been shown to be superior to Dixon-Coles in predicting the future.
Until that happens, doubts about their weaknesses will remain, and real goals must hold their ground as the only truly reliable indicator of how good a team is.
provided by speech
This article has been republished under a Creative Commons license from The Conversation. Read the original article.
Quotation: A brief history of statistics in football: Why real goals remain king in predicting who will win (2022, 30 Dec), on 31 Dec 2022 https://phys.org/news/2022-12-history-statistics-soccer- Retrieved from. real-goals.html
This document is subject to copyright. No part may be reproduced without written permission, except in fair trade for personal study or research purposes. The content is for informational purposes only.