Koehn AI Company Logo

Machine Learning and Football

02 September 2020

Up until 15 years ago, football was completely reliant on instinct and intuition. This changed with the then younger generation of so-called laptop coaches. The term itself signifies the drastic change of paradigm. Whereas before, choices were made mainly on gut feeling and professional experience, it now became possible to base decisions on data. One prominent early example being the little note passed on to goalkeeper Jens Lehmann right before the penalty shoot-out of Germany vs Argentina in the WC 2006.

An abundance of data in professional football

These days, there is an abundance of data in professional football. Sportec Solutions, a subsidiary of DFL, produces about 3 million data points per match. 25 times per second, a range of features are tracked for all 22 players on the pitch. Features include amongst others the coordinates of player and ball. Manually created sets of event data additionally provide information about corner kicks, successful and missed shots on targets, bookings, and much more. Data sets like these are being sold to TV broadcasters and clubs. Machine Learning and Football (Photo by Bence Balla-Schottner on Unsplash)

Generating tracking data

Companies such as Sportec Solutions have access to special cameras placed in suitable spots around the pitch that in principle allow for a precise tracking of ball and players. This setup has its limitations: the data sets feature a significant level of noise and the accuracy is not high enough to meaningfully study ball trajectories such as swerve and acceleration. The position of extremities and vertical coordinates of players are not recorded. In principle, anyone can start producing data about football matches from suitable broadcasting images - albeit projected onto two dimensions, most of the information is there. Obviously, this task can not be done manually and calls for computer-vision techniques. The startup company Subsequent is implementing suitable services in an automated fashion using state-of-the-art techniques.

Use Cases

The tracking and event data can be used for a wide range of use cases. Insights can be supplied to coaches and analysts at a glance via accessible dashboards with tangible illustrations. On the one hand, these include the derivation of direct statistics about individual players, groups of players or the team as a whole. These are simple aggregate statistics such as the ones you are probably used to seeing on TV, including success rates of passes, tackles, shots, each for the team as a whole and for individual players. Beyond that, it can be very insightful to construct more abstract statistics, such as goalie metrics, effectiveness of passes, the infamous packing rates, or statistics about set pieces.

Machine Learning in Professional Football

The next level of insights will be generated via machine-learning techniques. The first glimpse of what might arrive in the not too distant future is given by the expected goal values xG which will be featured in TV broadcasting from season 20/21. Any shot of the current match will be compared against the history of all similar shots that the machine-learning model is trained with. The model setup includes amongst other properties the position of the shot taker and the opponent pressure. But machine learning enables many further use cases. For instance, detecting complex patterns automatically will save the match analysts great amounts of time which they can invest in other topics. Let us focus on a specific example.

Automated tactical analysis of team behavior

It is a very time-consuming task for analysts to watch scenes of matches and take note of the tactical variants played by an opponent. In the case of set pieces, this includes the initial line-up as well as players' trajectories in anticipation of the delivery of the set piece. Based on tracking data, state-of-the-art machine-learning algorithms can perform this process automatically. Technically, this can be solved by encoding the spatio-temporal coordinate trajectories of all players using a combination of convolutional and recurrent neural networks, compressing the high-dimensional information about all the players' coordinates into a lower-dimensional latent space via the autoencoder technique and subsequently running a clustering algorithm on the latent space.

Defensive Advice Board

The goal is to have the system cluster similar set-piece strategies together and to set up a cluster profile for each opponent. Via a suitable dashboard visualization, such a cluster profile enables the coach to tell at a glance which set-piece strategies have been conducted by the opponent and at which frequency. The analysis can be based on a user-defined time frame, but it can also detect pattern changes during the course of a season, e.g. due to the installment of a new head coach or injury of key players.

Opponent Behavior Simulation

A further emerging technique is given by the ghosting method. This approached is based on LSTM networks and Imitation Learning and serves to illustrate the behavior of specific or of averaged teams with respect to concrete set-piece scenarios. This method enables automated answering of questions such as "how would team X defend against offensive strategy Y, and how will this influence the set piece’s xG value?"


In spite of being conceptually simple, football is an enormously complex game to optimize. While insights from machine learning won't ever be able to make up for lack of motivation or character in a team, they will definitely change the way the game is played over the years to come. We will see playing strategies converge towards an overall higher degree of efficiency. The bottom-up approach for how to act locally in order to achieve a common outcome will be replaced by stringent instructions derived top-down. Other team sports such as handball or basketball will also profit from these advancements.

Stay up to date with our free newsletter


Leave a comment