Installing the Statsbomb Python Library

Installing the Statsbomb Python Library

I have started using Python more and wanted to find away of creating Jupyter notebooks on VS Code and converting them to Markdown files for my Blogdown created in R Studio. Didn’t take me long to find a really simple tutorial on how to easily do this. For this tutorial, I am going to download and install the Statsbomb Python library, using a zip folder I have downloaded from their github page.

Before I get started, I like to work with Jupyter Notebooks within my VS Code environment. This might not be normal, I’m not sure but for those that are interested you can find how to implement this on your device here. For this type of environment, my terminal working directory is always pointed to where the current file is saved, or where my VS Code project is based. This makes it quite easy to use and direct to data files saved within the working directory.

Once I have downloaded the statsbombpy zip file from github, I added this to my current Jupyter notebook working directory. In the terminal, I changed my working directory using cd statsbombpy-master, then ran pip install . to install the statsbombpy library successfully.

Now we have the library installed, let’s see how easy it is to run and pull the free competitions in to our notebook.

### First we must import the relevant library.
from statsbombpy import sb

### Then we can now call all free competitions 
comps = sb.competitions()
comps.head(5)
credentials were not supplied. open data access only
competition_id season_id country_name competition_name competition_gender season_name match_updated match_available
0 37 42 England FA Women’s Super League female 2019/2020 2020-03-11T14:09:41.932138 2020-03-11T14:09:41.932138
1 37 4 England FA Women’s Super League female 2018/2019 2020-02-27T15:59:58.148 2020-02-27T15:59:58.148
2 43 3 International FIFA World Cup male 2018 2019-12-16T23:09:16.168756 2019-12-16T23:09:16.168756
3 11 4 Spain La Liga male 2018/2019 2020-02-27T12:19:39.458017 2020-02-27T12:19:39.458017
4 11 1 Spain La Liga male 2017/2018 2020-02-27T12:19:39.458017 2020-02-27T12:19:39.458017

We can then find the matches using the matches function from the statsbombpy library. Let’s do this for the 2019/2020 season of the FA WSL.

### Find the free matches from a league in the competitions table above
## Add the competition id below.
comp = 37

## Add the season id below
season = 42

### Run the matches function to pull all the matches from the competition and season. 
matches = sb.matches(competition_id = comp, season_id = season)
matches.head(5)

credentials were not supplied. open data access only

match_id match_date kick_off competition season home_team away_team home_score away_score match_status last_updated match_week competition_stage stadium referee data_version shot_fidelity_version xy_fidelity_version
0 2275038 2020-02-12 20:30:00.000 England - FA Women’s Super League 2019/2020 Reading WFC West Ham United LFC 2 0 available 2020-02-14T17:43:49.368 16 Regular Season Adams Park A. Bryne 1.1.0 2 2
1 2275037 2020-02-02 15:00:00.000 England - FA Women’s Super League 2019/2020 Manchester City WFC Arsenal WFC 2 1 available 2020-02-04T17:25:33.263 14 Regular Season Academy Stadium S. Pearson 1.1.0 2 2
2 2275027 2020-02-02 15:00:00.000 England - FA Women’s Super League 2019/2020 Brighton & Hove Albion WFC Everton LFC 1 0 available 2020-02-04T17:28:02.434 14 Regular Season NaN A. Bryne 1.1.0 2 2
3 2275030 2020-02-23 15:00:00.000 England - FA Women’s Super League 2019/2020 Brighton & Hove Albion WFC Tottenham Hotspur Women 0 1 available 2020-02-26T15:02:00.122 17 Regular Season NaN L. Saunders 1.1.0 2 2
4 2275120 2019-09-08 15:00:00.000 England - FA Women’s Super League 2019/2020 Birmingham City WFC Everton LFC 0 1 available 2019-12-16T23:09:16.168756 1 Regular Season SportNation.bet Stadium E. Swallow 1.1.0 2 2

As easy as that, we have all the matches available in the Statsbomb free data set, detailing everything we might want to know about the specific matches. If we change the comp and season values and supply them to the matches function, we can get the details from a different competition or season very quickly.

But who played for each team in these matches, we can find that too using the lineups function supplied in the library.

### Run the lineups function to get the lineups for each team in a given match. 
## Add match_id here
match = 2275038

## Run function and assign to lineups
lineup = sb.lineups(match_id = match)['West Ham United LFC']
lineup.head(10)

credentials were not supplied. open data access only

player_id player_name player_nickname jersey_number country
0 8297 Adriana Leon None 19 Canada
1 15421 Kenza Dali None 21 France
2 18146 Leanne Kiernan None 8 Ireland
3 18147 Kate Longhurst None 12 England
4 18150 Julia Simic None 10 Germany
5 18151 Gilly Louise Scarlett Flaherty Gilly Flaherty 5 England
6 18153 Alisha Lehmann None 7 Switzerland
7 22027 Anne Moorhouse None 1 England
8 23217 Tessel Middag None 23 Netherlands
9 31553 Cecilie Redisch Kvamme None 2 Norway

The output of this call is a little different, and needs to be subset to print nicely in markdown. But as a JSON format this is very easy in Python, by adding the “[‘West Ham United LFC’]” at the end of the call, we were able to subset all the West Ham line up data.

Lastly and most importantly, the event data can be called using one of two event functions in the library, either on a given match or an entire league.

sb.events will call the events for a given match, by passing the match id within the function. sb.competition_events will get all the events from a specified league with details found on the library github site.

### Run function to call the events from a single match
## Run the event function using the assigned match from above
match_events = sb.events(match_id = match)
match_events.head(5)

credentials were not supplied. open data access only

50_50 bad_behaviour ball_receipt ball_recovery block carry clearance counterpress dribble duel possession_team related_events second shot substitution tactics team timestamp type under_pressure
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Reading WFC NaN 0 NaN NaN {‘formation’: 41212, ‘lineup’: [{‘player’: {’i… Reading WFC 00:00:00.000 Starting XI NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Reading WFC NaN 0 NaN NaN {‘formation’: 4231, ‘lineup’: [{‘player’: {’id… West Ham United LFC 00:00:00.000 Starting XI NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Reading WFC [035f18f5-8767-475f-b96b-b1548c2fd642] 0 NaN NaN NaN West Ham United LFC 00:00:00.000 Half Start NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Reading WFC [da9c5398-dae9-4a3d-b821-fd600b54a55d] 0 NaN NaN NaN Reading WFC 00:00:00.000 Half Start NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Reading WFC [f0bd2ba7-a946-4414-b04f-aeeae0928f31] 0 NaN NaN NaN West Ham United LFC 00:00:00.000 Half Start NaN

5 rows × 41 columns

As we can see, there is a lot of information to be found in the event files, which will require a lot of data transformation before we can use this effectively. This will be the aim of my next tutorial in Python.

I hope to add more Python tutorials in the next little while, but until then, stay safe out there!

Josh Trewin
Josh Trewin
Data Scientist

I’m a data scientist, learning my way through R / Python and applying to football data from StatsBomb, provided for free through GitHub. Follow my journey on here or Twitter to find out when I add new content.

comments powered by Disqus