Installing the Statsbomb Python Library
I have started using Python more and wanted to find away of creating Jupyter notebooks on VS Code and converting them to Markdown files for my Blogdown created in R Studio. Didn’t take me long to find a really simple tutorial on how to easily do this. For this tutorial, I am going to download and install the Statsbomb Python library, using a zip folder I have downloaded from their github page.
Before I get started, I like to work with Jupyter Notebooks within my VS Code environment. This might not be normal, I’m not sure but for those that are interested you can find how to implement this on your device here. For this type of environment, my terminal working directory is always pointed to where the current file is saved, or where my VS Code project is based. This makes it quite easy to use and direct to data files saved within the working directory.
Once I have downloaded the statsbombpy zip file from github, I added this to my current Jupyter notebook working directory. In the terminal, I changed my working directory using cd statsbombpy-master, then ran pip install . to install the statsbombpy library successfully.
Now we have the library installed, let’s see how easy it is to run and pull the free competitions in to our notebook.
### First we must import the relevant library.
from statsbombpy import sb
### Then we can now call all free competitions
comps = sb.competitions()
comps.head(5)
credentials were not supplied. open data access only
competition_id | season_id | country_name | competition_name | competition_gender | season_name | match_updated | match_available | |
---|---|---|---|---|---|---|---|---|
0 | 37 | 42 | England | FA Women’s Super League | female | 2019/2020 | 2020-03-11T14:09:41.932138 | 2020-03-11T14:09:41.932138 |
1 | 37 | 4 | England | FA Women’s Super League | female | 2018/2019 | 2020-02-27T15:59:58.148 | 2020-02-27T15:59:58.148 |
2 | 43 | 3 | International | FIFA World Cup | male | 2018 | 2019-12-16T23:09:16.168756 | 2019-12-16T23:09:16.168756 |
3 | 11 | 4 | Spain | La Liga | male | 2018/2019 | 2020-02-27T12:19:39.458017 | 2020-02-27T12:19:39.458017 |
4 | 11 | 1 | Spain | La Liga | male | 2017/2018 | 2020-02-27T12:19:39.458017 | 2020-02-27T12:19:39.458017 |
We can then find the matches using the matches function from the statsbombpy library. Let’s do this for the 2019/2020 season of the FA WSL.
### Find the free matches from a league in the competitions table above
## Add the competition id below.
comp = 37
## Add the season id below
season = 42
### Run the matches function to pull all the matches from the competition and season.
matches = sb.matches(competition_id = comp, season_id = season)
matches.head(5)
credentials were not supplied. open data access only
match_id | match_date | kick_off | competition | season | home_team | away_team | home_score | away_score | match_status | last_updated | match_week | competition_stage | stadium | referee | data_version | shot_fidelity_version | xy_fidelity_version | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2275038 | 2020-02-12 | 20:30:00.000 | England - FA Women’s Super League | 2019/2020 | Reading WFC | West Ham United LFC | 2 | 0 | available | 2020-02-14T17:43:49.368 | 16 | Regular Season | Adams Park | A. Bryne | 1.1.0 | 2 | 2 |
1 | 2275037 | 2020-02-02 | 15:00:00.000 | England - FA Women’s Super League | 2019/2020 | Manchester City WFC | Arsenal WFC | 2 | 1 | available | 2020-02-04T17:25:33.263 | 14 | Regular Season | Academy Stadium | S. Pearson | 1.1.0 | 2 | 2 |
2 | 2275027 | 2020-02-02 | 15:00:00.000 | England - FA Women’s Super League | 2019/2020 | Brighton & Hove Albion WFC | Everton LFC | 1 | 0 | available | 2020-02-04T17:28:02.434 | 14 | Regular Season | NaN | A. Bryne | 1.1.0 | 2 | 2 |
3 | 2275030 | 2020-02-23 | 15:00:00.000 | England - FA Women’s Super League | 2019/2020 | Brighton & Hove Albion WFC | Tottenham Hotspur Women | 0 | 1 | available | 2020-02-26T15:02:00.122 | 17 | Regular Season | NaN | L. Saunders | 1.1.0 | 2 | 2 |
4 | 2275120 | 2019-09-08 | 15:00:00.000 | England - FA Women’s Super League | 2019/2020 | Birmingham City WFC | Everton LFC | 0 | 1 | available | 2019-12-16T23:09:16.168756 | 1 | Regular Season | SportNation.bet Stadium | E. Swallow | 1.1.0 | 2 | 2 |
As easy as that, we have all the matches available in the Statsbomb free data set, detailing everything we might want to know about the specific matches. If we change the comp and season values and supply them to the matches function, we can get the details from a different competition or season very quickly.
But who played for each team in these matches, we can find that too using the lineups function supplied in the library.
### Run the lineups function to get the lineups for each team in a given match.
## Add match_id here
match = 2275038
## Run function and assign to lineups
lineup = sb.lineups(match_id = match)['West Ham United LFC']
lineup.head(10)
credentials were not supplied. open data access only
player_id | player_name | player_nickname | jersey_number | country | |
---|---|---|---|---|---|
0 | 8297 | Adriana Leon | None | 19 | Canada |
1 | 15421 | Kenza Dali | None | 21 | France |
2 | 18146 | Leanne Kiernan | None | 8 | Ireland |
3 | 18147 | Kate Longhurst | None | 12 | England |
4 | 18150 | Julia Simic | None | 10 | Germany |
5 | 18151 | Gilly Louise Scarlett Flaherty | Gilly Flaherty | 5 | England |
6 | 18153 | Alisha Lehmann | None | 7 | Switzerland |
7 | 22027 | Anne Moorhouse | None | 1 | England |
8 | 23217 | Tessel Middag | None | 23 | Netherlands |
9 | 31553 | Cecilie Redisch Kvamme | None | 2 | Norway |
The output of this call is a little different, and needs to be subset to print nicely in markdown. But as a JSON format this is very easy in Python, by adding the “[‘West Ham United LFC’]” at the end of the call, we were able to subset all the West Ham line up data.
Lastly and most importantly, the event data can be called using one of two event functions in the library, either on a given match or an entire league.
sb.events will call the events for a given match, by passing the match id within the function. sb.competition_events will get all the events from a specified league with details found on the library github site.
### Run function to call the events from a single match
## Run the event function using the assigned match from above
match_events = sb.events(match_id = match)
match_events.head(5)
credentials were not supplied. open data access only
50_50 | bad_behaviour | ball_receipt | ball_recovery | block | carry | clearance | counterpress | dribble | duel | … | possession_team | related_events | second | shot | substitution | tactics | team | timestamp | type | under_pressure | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | Reading WFC | NaN | 0 | NaN | NaN | {‘formation’: 41212, ‘lineup’: [{‘player’: {’i… | Reading WFC | 00:00:00.000 | Starting XI | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | Reading WFC | NaN | 0 | NaN | NaN | {‘formation’: 4231, ‘lineup’: [{‘player’: {’id… | West Ham United LFC | 00:00:00.000 | Starting XI | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | Reading WFC | [035f18f5-8767-475f-b96b-b1548c2fd642] | 0 | NaN | NaN | NaN | West Ham United LFC | 00:00:00.000 | Half Start | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | Reading WFC | [da9c5398-dae9-4a3d-b821-fd600b54a55d] | 0 | NaN | NaN | NaN | Reading WFC | 00:00:00.000 | Half Start | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | Reading WFC | [f0bd2ba7-a946-4414-b04f-aeeae0928f31] | 0 | NaN | NaN | NaN | West Ham United LFC | 00:00:00.000 | Half Start | NaN |
5 rows × 41 columns
As we can see, there is a lot of information to be found in the event files, which will require a lot of data transformation before we can use this effectively. This will be the aim of my next tutorial in Python.
I hope to add more Python tutorials in the next little while, but until then, stay safe out there!