PepCity: analyzing Guardiola’s impact on Manchester City’s performance

Author

Takudzwanashe Michael Mhuru

Published

May 1, 2023

Manchester City Logo

Introduction

Background and Motivation

Manchester City Football club is considered one of the best football teams in the world. They have won the English premier league four times in the last five seasons and are on track to win their fifth title. They have also won multiple domestic cups in England and the UK. Manchester City haven’t always been a successful club however, and some fans argue that their recent success has only come as a result of ownership change, massive spending and their latest manager, Pep Guardiola. My goal was to investigate Manchester City’s history in the EPL and compare their recent performances under Pep to previous managers. Manchester City’s rallying song ‘Blue Moon’ refers to the fact that the team would win away games “once in a blue moon”. I was interested in quantifying how much this has changed in recent seasons.

About the data

For my analysis, I used two datasets sourced from Kaggle in csv format. One contains all Premier League matches played since the EPL’s inception in 1992 up to the recently ended 2022 season. The other dataset contains tweets related to all premier league teams sourced from 2020 for the months of July to October that capture most of the 2020 - 2021 season.

Methodology

Wrangling

EPL dataset

The first task was to extract data relating to Manchester city from the EPL dataset as well as the Tweets dataset. I then separated the data for home and away games in the EPL, since teams typically perform better at home than away, and further separated it for games played before Pep Guardiola became coach (1992-2016) and for those played after he became coach (2016-2022). This was achieved using masks. I also had a custom function to standardise column names.

#standardise column names
def standardise_column_names(df, remove_punct=True):
    translator = str.maketrans(string.punctuation, ' '*len(string.punctuation))

    for c in df.columns:
        c_mod = c.lower()
        if remove_punct:            
            c_mod = c_mod.translate(translator)
        c_mod = '_'.join(c_mod.split(' '))
        if c_mod[-1] == '_':
            c_mod = c_mod[:-1]
        c_mod = re.sub(r'\_+', '_', c_mod)
        df.rename({c: c_mod}, inplace=True, axis=1)
    return df

Tweet dataset

The first task was to extract data relating to Manchester City from the Tweets dataset as it contained tweets relating to all Premier League clubs. The dataset was organised by files, so a mask to the ManchesterCity file column returned the required records, nearly 140 000 tweets that were filtered for the hashtags : #ManchesterCity, #Mancity or #MCFC from July to October in 2020 which captures the end of the pre-season and the first half of the season ending in 2021.The tweet data naturally contained emojis which are read as UTF codes in Python. I created a function to get rid of them so as to accurately perform natural language processing on the dataset. The function is shown below.

# function to remove emojis
def remove_emojis(text):
    emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', text)

Analysis

Overall team statistics

W-D-L distribution

Firstly, I wanted to look at the distribution of wins, losses and draws for Manchester City across all games in its Premier League history. The following plots show those results aggregated as percentages by home games, away games and the totals.

From the plots, we can see that Manchester City has won nearly 60% of their home games, 40% of away games and about 50% of their matches overall. They have drawn about 20% of home games which is equally as much as they have lost at home, nearly 25% of their away games and about 20% overall. Manchester City have lost near 35% of their away games and 30% games overall. These stats are really impressive for a team that has been relegated several times in the past.

Goals conceded vs. Goals scored per team

I also wanted to visualize the goals scoredand conceeded by Manchester City against all their opponents in the Premier league to see how they match up.

Loading BokehJS ...

From the plots, we can see that Manchester City have scored more goals against their opponents for both home and away games. Generally, it seems that Manchester City score more goals at home than away and concede less at home, confirming that the team is indeed more likely to win at home than away.

How good is Pep Guardiola?

Manchester City Logo

I also wanted to investigate the impact of Pep Guardiola on Manchester City’s performances. Of course, it is important to realise that more goes into the teams statistics than the manager, with factors such as squad depth and signings, opposition squad depth and injuries playing a part. It can be argued however that these factors vary per season and per manager and across all teams such that it is fairly plausible to compare Pep Guardiola’s performance to that of previous managers with different teams under their watch.

W-D-L distribution

Firstly, I plotted the win, draw and loss distribution of Manchester City for home, away and all games both before and under Pep Guardiola.

Results Before Pep

Results under Pep

From the plots we can see that before Pep Guardiola, Manchester City had won nearly 50% of their home games, 30% of their away games and 40% overall. They had drawn just over 20% of their home games, about 25% of their away games and overall. At home, Manchester City had lost about 25% of their home games before Pep Guardiola became manager, over 25% of their away games and nearly 35% of all their games.

Under Pep Guardiola, these stats improved significantly. Manchester City have won nearly 80% of all their home games, 70% of their away games and over 70% overall. They have only drawn about 10% at home and away and overall and they have lost less than 10% of their home games, about 15% of their away games and about 10% overall. This shows a remarkable improvement in both home and away games under Pep Guardiola. It seems that now, Manchester City wins regurlarly away, and not once in a “Blue Moon”.

Goals scored and conceded before Pep Guardiola

I also wanted to visually compare the goals scored and conceded by Manchester City before and after Pep Guardiola became manager.

Loading BokehJS ...

Goals scored and conceded under Pep Guardiola

Loading BokehJS ...

The results are not easily comparable since Premier League teams are not constant and some teams that have historically frequently featured in the Premier League have not been promoted under Pep Guardiola’s charge. However, we can still see that under Pep, the margins between goals scored and conceded have positively widened.

xG via Machine Learning

Traditionally, xG analyses a specific shot in a match to determine whether or not it would result in a goal. For example, a shot with an xG value of 0.2 is one that we would generally expect to be converted twice in every 10 attempts.

In this context, xG measures the goals expected to be scored or conceeded by a team, given the results of previous encounters with specified opponents. In this case, an xG of 0.2 is one that says that a team is expected to score a 0.2 goal. This might seem reductive, however this is a sure way to see how much the attacking and defensive capacity of a team changes in select periods, withstanding factors such as chance, injuries, red cards and penalties that play into the end results of a game.

There is obviously a degree of residual variance between goals and xG over certain time periods given that goals have a unit scale of 1, whereas xG values fall on a scale with decimals. Perhaps chance is accounted for by our decision to round up or down on a generated statistic, a decision that here has been left to the discretion of the reader.

To best learn how much Manchester City’s goal margins have improved under Pep Guardiola, I trained a simple machine learning model to predict the goals expected to be scored and conceded by Manchester City given a specific opponent. I further specified that the model should predict differently for home and away games as we have seen that there are significant differences in performance for games played at home and those played away.

To get an idea of the model’s output, I selected 5 teams (Liverpool, Manchester United, Arsenal, Chelsea, Tottenham Hotspur),that have arguably been top teams historically in the Premier League and that would have faced Manchester City occasionally both before and after Pep Guardiola became manager. I also selected West Ham, Newcastle United, Burnley, Aston Villa and Everton to add balance. The sample model and results are displayed below.

Expected goals before Pep

Home gamesAway games
Team xG against City xG for City
Arsenal 1.84 1.21
Chelsea 1.21 0.89
Liverpool 1.21 1.37
Manchester Utd 1.26 1.26
Tottenham 1.32 1.53
Aston Villa 0.79 2.63
West Ham 0.60 1.73
Everton 1.05 1.68
Newcastle Utd 0.65 2.35
Burnley 2.50 2.50
Team xG against City xG for City
Arsenal 1.58 0.58
Chelsea 1.84 0.84
Liverpool 2.05 0.89
Manchester Utd 1.79 1.32
Tottenham 1.58 1.21
Aston Villa 1.05 1.16
West Ham 1.53 1.13
Everton 1.37 0.95
Newcastle Utd 1.18 1.35
Burnley 1.00 3.00

Expected goals under Pep

Home gamesAway games
Team xG against City xG for City
Arsenal 0.50 2.83
Chelsea 1.00 2.00
Liverpool 0.83 2.50
Manchester Utd 1.50 1.67
Tottenham 1.33 2.33
Aston Villa 0.67 2.67
West Ham 0.67 2.00
Everton 0.67 2.50
Newcastle Utd 0.40 3.40
Burnley 0.17 3.67
Team xG against City xG for City
Arsenal 0.50 2.17
Chelsea 1.17 1.17
Liverpool 1.83 1.67
Manchester Utd 0.67 1.33
Tottenham 1.33 0.67
Aston Villa 1.00 3.33
West Ham 0.67 3.33
Everton 1.17 2.00
Newcastle Utd 1.40 2.40
Burnley 0.50 2.00

As we can see, Manchester City’s xG changed and improved across all the teams for home and away games under Pep Guardiola. This is true for notable teams like Arsenal, Chelsea, Manchester United and Liverpool. For teams like Everton and West Ham away, the xG has been turned around in favor of Manchester City since 2016 when Guardiola took charge.

Tweet city: does Manchester City have no fans?

Manchester City Logo

There is a general belief in the Premier League that Manchester City have no fans or have ‘plastic fans’. On the other hand, Manchester City fans like myself believe that we are overly hated and spoken negatively about by rival fans. I was interested in seeing what the discourse about Manchester City in this season was and what the general sentiment carried by the tweets was.

Sentiment analysis

To investigate whether Manchester City is really hated, I used the nltk package to calculate the polarity of each tweet and assigned them labels as neutral, positive or neutral based on their score.

Loading BokehJS ...

From the plot of sentiment distribution we can see that most of the tweets were positive and neutral and a small number of them were negative. This means that contrary to my belief, people generally speak more positively than negatively in their discourse of Manchester City on twitter - at least in reference to this time period. We could argue that Manchester City has a lot of fans, given that the conversations are highly positive. If not, then rival fans engage in fan behavior towards Manchester City ;)

Distribution of tweets captured

To truly capture the nature of the discourse about Manchester City, I also wanted to capture how the tweets frequency varied for the time period capture aggregated by sentiment. The plot below illustrates this.

From the plot, we can see that the highest volume of tweets was generated in the later part of the pre-season (2741 tweets on July 19) when Manchester City lost the FA Cup semi final to Arsenal and when Manchester City announced interest in signing the then Bournemouth defender, Nathan Ake (2704 tweets on July 22). The tweets fluctuated but dropped to their lowest in September (25 tweets on September 19). Positive tweets were more frequent generally over the time period and always outnumbered negative tweets. We can argue that overall, people spoke positively about Manchester City on any given day than they did negatively. It is important to note that negative tweets may also capture part of Manchester City’s fan base given the emotions carried after losing a game or exiting a competition. Some people’s tweets might feature frequently over this time period which means that this is by no means an indicator of the number of fans Manchester City has, coupled with the fact that even rivals or neutrals also tweet about the club.

Wordcloud

I also generated a wordcloud so that I could see what words featured frequently in the tweets and gain insights on some of the conversations.

<matplotlib.image.AxesImage at 0x7fedf2d558b0>

From the word cloud we can see that some of the more popular names and words that featured are Ferran Torres, (Kevin) de Bruyne, Pep Guardiola, David Silva and Lionel Messi. Some club names were also part of the tweets including Real Madrid, Barcelona, Chelsea and Valencia. Some football terms also showed up such as FA Cup, Chamions League, players transfer and away kit.

Conclusion & Limitations

There is no doubt that Manchester City have improved in their style of play and in their perforance. However, there are a lot of factors that affect the performance of a football team. To accurately potray Manchester City’s improvement under Pep Guardiola, some of these factors need to be taken into account.

Future work

I hope to build on this project by gathering other stats, for example in other domestic competitions like the FA cup and the League Cup to see how Manchester City have fared in competitions outside the premier league. I also want to look into the tactical analysis to see how posession, defense and attack have evolved for Manchester City compared to previous managers. Lastly, I want to extend the project to look at each individual manager in Premier League history so that their individual contributions to the team’s managerial history can be clearly seen. It would also be interesting to extend the project to teams like Manchester United who have been in the Premier league for a longer time and whose performances have regressed in recent times.

References

Datasets

Premier League dataset: https://www.kaggle.com/datasets/evangower/premier-league-matches-19922022

Twitter dataset: https://www.kaggle.com/datasets/wjia26/epl-teams-twitter-sentiment-dataset

Python packages

Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585, 357–362. https://doi.org/10.1038/s41586-020-2649-2

McKinney, W., & others. (2010). Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 51–56).

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95.

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. ” O'Reilly Media, Inc.”

P'erez, Fernando, & Granger, B. E. (2007). IPython: a system for interactive scientific computing. Computing in Science & Engineering, 9(3).

Pedregosa, F., Varoquaux, Ga”el, Gramfort, A., Michel, V., Thirion, B., Grisel, O., … others. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825–2830.

Oesper, L., Merico, D., Isserlin, R., & Bader, G. D. (2011). WordCloud: a Cytoscape plugin to create a visual semantic summary of networks. Source Code for Biology and Medicine, 6(1), 7.

Van Rossum, G. (2020). The Python Library Reference, release 3.8.2. Python Software Foundation.