Statistical analysis of the season to date | Page 4 | Vital Football

Statistical analysis of the season to date

According to some media reports Forest are using, having signed up to, DRIBLAB to base their transfer strategy on. Anyone know anything about them, how it works, is it based on things like XG but player based or what?

I don't know them in any detail but they keep metrics on players, almost like real time championship manager stats. I'm sure xG will be used but only one of a multitude of analytical and predictive tools.
 
Exactly.

If someone's using all those data points to spot problems and solve them, improving a team's performance, fine.

Creating pretend league tables based on xG stats is misusing the tool, in that case!

And I still suggest that interpreting the data is subjective, no matter how many data points you have. Raw data might tell you a lot about quantity without judging quality.

So 'Ameobi turned well eight yards out but shot tamely straight at the keeper' contains a value-judgement - the shot could have been harder and/or better-placed. I'd imagine the chance would rate highly as an 'expected goal', but the stats alone wouldn't tell you whether it was a brilliant save or a poor shot.

And would they tell you how the chance was created? If Ameobi's turn was a piece of rare skill that would outwit most defenders, there would be little point in the Brentford coaching staff using xG to try and prevent such chances.

So it's not as simple as 'objective stats good, subjective bias bad'. You can't possibly rank goalscoring chances purely by distance from goal, number of defenders back, etc - any realistic approach includes judging how well the players involved did their jobs at the crucial moment.

On that basis, I'd challenge the view that we're not creating enough. Yes, we could be more fluent and create more shots on goal from open play. We don't knock the ball around as a team in quite the way Leeds, West Brom, Fulham or indeed Brentford do. Give it time, it should come as the players get more used to each other's game.

But some of the long passes against Brentford were superb. Watson pinged a couple out to the wings, Samba found Ribeiro well...

...and Carvalho and Chema both came close to sending Grabban clean through with exquisite passes curled over the top of defenders. On another day (the difference may be impossible to capture with stats!) he gets free and scores one of those, Ameobi scores one of his two chances, Silva scores with either the free kick or the shot.

Maybe I'll have to grudgingly dive into the world of xG and see how it analysed Forest v Brentford. If it tells me Brentford should have won, I rest my case.

Again, pulling up small isolated incidents is just you showing your bias and not understanding that this is macro and not micro. It's a useful tool to give you an overview, but you need to stop thinking that because it doesn't quantify exactly it is without merit.

As for Brentford statistics have helped them to consistently recruit well and out perform clubs of a similar and larger size.
 
Again, pulling up small isolated incidents is just you showing your bias and not understanding that this is macro and not micro. It's a useful tool to give you an overview, but you need to stop thinking that because it doesn't quantify exactly it is without merit.

As for Brentford statistics have helped them to consistently recruit well and out perform clubs of a similar and larger size.

They're hardly 'small isolated incidents' in terms of 'expected goals' (which means nothing unless it bears some relation to chances created!)

You need to lose your obsession with me showing 'bias' when I'm just pointing out that the methodology must involve human judgement at some stage.

Nothing can quantify exactly, beyond simple stuff like how many throw-ins, corners, etc each side has had. Once you get on to chances created, there's room for debate about how many chances and how clear-cut they were. In ranking them in order of merit (from 'should have scored' to 'would have been a wonder goal'), something beyond simple statistical analysis is required.
 
The irony is that the actual league table gives the 'macro' overview. Forest have scored 16 and conceded 9, same as Swansea, Leeds 15-7. Brentford are 9-9.

So any 'expected goals' table which puts Brentford 3rd and Forest 17th has tried to pin down the detail but got it badly wrong. I've found the graphic now, and it gives us goals of 11.5 - 12.8, Brentford 14.8 - 8.4. Leeds 22.4 - 9.2, way out ahead on 30 points!

Clearly some work needed on the model before it predicts real goals and results accurately, however useful it might be in other respects.
 
The fact you just keep bringing up the odd incident shows you can't grasp this. As I said I'm not going to teach someone unwilling and incapable. Just accept you don't understand analytics and move on.
 
The fact you just keep bringing up the odd incident shows you can't grasp this. As I said I'm not going to teach someone unwilling and incapable. Just accept you don't understand analytics and move on.
But I didn't ask you to teach me. I hoped to teach you that stats won't ever give you that perfect objectivity you crave.

If I list some of the main chances created by Forest in a game, that's not 'the odd incident'! Chances created must be relevant to a system called 'expected goals' which claims to accurately predict numbers of goals scored and conceded.

It's as if you can't accept Forest's chances as relevant when described in words, only when number-crunched to either 0.57 or 0.83 of a goalscoring chance. Or is xG actually vaguer than that, preferring to reward teams for stringing a few passes together in the opposition half?

Hardly matters if you're the high priest who understands it all and can condescend to us lesser mortals. But maybe you could tell us just what data the expected goals come from in that grossly inaccurate predictive system.
 
But I didn't ask you to teach me. I hoped to teach you that stats won't ever give you that perfect objectivity you crave.

If I list some of the main chances created by Forest in a game, that's not 'the odd incident'! Chances created must be relevant to a system called 'expected goals' which claims to accurately predict numbers of goals scored and conceded.

It's as if you can't accept Forest's chances as relevant when described in words, only when number-crunched to either 0.57 or 0.83 of a goalscoring chance. Or is xG actually vaguer than that, preferring to reward teams for stringing a few passes together in the opposition half?

Hardly matters if you're the high priest who understands it all and can condescend to us lesser mortals. But maybe you could tell us just what data the expected goals come from in that grossly inaccurate predictive system.

Of course it's the odd incident in comparison to 100,000s of data points lol and the fact you only highlight incidents for us which is a recurring theme in your posts (apart from when we commit fouls/time waste etc) is telling.

I've never said statistics give a perfect model, they're inherently flawed in multiple ways. The trouble is you then consider them to be without value, especially if they challenge your narrative. Both positions are as ludicrous as each other. The fact you call it grossly inaccurate only serves to highlight your limited understanding.

In short you're a luddite freaking out at something you can't/won't grasp.

Go away, do some research and then we can chat.
 
I can highlight incidents for Brentford if you like.

Cross flashed across goalmouth. Header way over bar. Deflected free kick well saved. Dangerous cross headed behind for a corner.

All I was saying was the chances for Forest seemed to outweigh those for Brentford (and I see xG scores the game in our favour 0.73 to 0.54, so maybe it does reflect reality in that game).

You don't say what these 'data points' are or why they're so useful, beyond their sheer number. I'm not that into stats that I'm going to dive into it, but I may do some research.
 
I just don't have the inclination to teach someone who isn't willing or is incapable on how to use statistics. It's a little like a creationist mocking evolution and then demanding a lecture.

I'll give you a very brief answer, data is data. Assuming your collection method is accurate then you have completely unbiased information at your disposal, something that's not true when we watch, especially as fans.

To use data effectively you have to understand its limitations (of which there are many but you don't have the background to understand them), and it's strengths.

So to take your example above, the league table is largely irrelevant because it isn't matching up with reality. However that doesn't mean it's without value.

Expected Goals (xG) gives an indication of the situations we've created that typically lead to a goal. The stats suggest that compared to rivals we are creating less clear cut opportunities and besides you I think everyone has noticed that we aren't creating a vast amount of goal scoring chances. So a manager can look at that and think how can I adjust strategically (or use the transfer market) to increase the amount of times we get into excellent positions. Hence why he keeps adjusting the shape and formation in behind Grabban. SL clearly doesn't think it nonsense...

What it doesn't take into account (especially in the short term) are things like how clinical a side is - Grabban when presented with the same opportunities as say me will score more goals, even with the same expected goal outcome.

Similarly it can't take into account the strengths of individual defenders and keepers in one on one duels.

There's a reason why players like Kane or a top keeper allow a team to outperform their xG.

It's also worth noting there are a number of different formulas for xG and unless you provide the one TalkSport used along with their dataset for me to interrogate it's impossible to give a more specific answer.

One small aside have you ever gone to a match and thought "we were unlucky there" - if so that's your brain performing an xG calculation itself...
I agree with the broad sweep of this, however, as someone who deals with interpreting datasets on a daily basis, I see very little value in fans discussing games based on anything but the simplest statistics of the game. (shots on goal, goals scored, shots saved, passes completed etc.) even then there are far too many variables to consider in interpreting those core stats.

I actually agree, to a degree, with Pebble's analysis in as much as there is (very likely) a 'post hoc ergo propter hoc' fallacy embedded in the table he is describing (or from what he claims there appears to be).

I don't have the time or inclination to check the data's source for accuracy, or examine the premises inherent in the logic either. But i suspect he is correct that there is someone with a political agenda, and a lot of time to spend in their bedroom in the pursuit of proving themselves smarter than someone else behind the (specific interpretation of the) statistics. (You know the type!)
 
So far I've found a quiz on BBC Sport and I'm no more impressed.

It shows actual chances and gives % expectations of goals. One of them is a shot from just outside the box, well set up, plenty of time to adjust your feet - they give it a 3% chance of going in. Seems miserly. (Also depends on who it falls to!)

Another shows a low ball deflected fairly fast to someone six yards out at the far post, and the chances of scoring are given as only 14% - but it's not a really difficult chance, and is only kept out by an outstanding save.

So I'm still extremely sceptical about the underlying data and how it's been interpreted.
 
I should have said - this quiz is about xG. There's also a chance for Andy King, good in the air, with the ball headed across to him two yards out. I went for 96%, they said 76% - he scored it, but even if he'd managed to miss it I'd still stand by my subjective, visual, superficial analysis. Hard to see what factors reduced the probability to 76%.
 
I agree with the broad sweep of this, however, as someone who deals with interpreting datasets on a daily basis, I see very little value in fans discussing games based on anything but the simplest statistics of the game. (shots on goal, goals scored, shots saved, passes completed etc.) even then there are far too many variables to consider in interpreting those core stats.

I actually agree, to a degree, with Pebble's analysis in as much as there is (very likely) a 'post hoc ergo propter hoc' fallacy embedded in the table he is describing (or from what he claims there appears to be).

I don't have the time or inclination to check the data's source for accuracy, or examine the premises inherent in the logic either. But i suspect he is correct that there is someone with a political agenda, and a lot of time to spend in their bedroom in the pursuit of proving themselves smarter than someone else behind the (specific interpretation of the) statistics. (You know the type!)

The table I saw is reproduced here

http://www.forestforum.co.uk/showthread.php?t=47978&page=3

post 61.
 
So far I've found a quiz on BBC Sport and I'm no more impressed.

It shows actual chances and gives % expectations of goals. One of them is a shot from just outside the box, well set up, plenty of time to adjust your feet - they give it a 3% chance of going in. Seems miserly. (Also depends on who it falls to!)

Another shows a low ball deflected fairly fast to someone six yards out at the far post, and the chances of scoring are given as only 14% - but it's not a really difficult chance, and is only kept out by an outstanding save.

So I'm still extremely sceptical about the underlying data and how it's been interpreted.

Lol
 
I agree with the broad sweep of this, however, as someone who deals with interpreting datasets on a daily basis, I see very little value in fans discussing games based on anything but the simplest statistics of the game. (shots on goal, goals scored, shots saved, passes completed etc.) even then there are far too many variables to consider in interpreting those core stats.

I actually agree, to a degree, with Pebble's analysis in as much as there is (very likely) a 'post hoc ergo propter hoc' fallacy embedded in the table he is describing (or from what he claims there appears to be).

I don't have the time or inclination to check the data's source for accuracy, or examine the premises inherent in the logic either. But i suspect he is correct that there is someone with a political agenda, and a lot of time to spend in their bedroom in the pursuit of proving themselves smarter than someone else behind the (specific interpretation of the) statistics. (You know the type!)

I've already highlighted just some of the limitations of data analysis in general and specifically with the xG model.

You've pretty much repeated everything I said, the nuance you're missing is Pebble's belief in conspiracy and mistrust of anything requiring interpretation.
 
Found this at https://www.fantasyfootballfix.com/blog-index/how-we-calculate-expected-goals-xg/

Quantifying Performance
Our dataset consists of 57,000 shots in the Premier League between the 2013/14 and 2018/19 seasons. We randomly split 80% of the data into a training set (used only to train the model) and 20% into a validation set (used only to quantify performance). We perform 5-fold cross validation, in which the dataset is split into 5 equal sized parts and each partition is used in turn to assess performance.
Previous analysis of xG models have aggregated data over the entire season and calculated the r2 correlation coefficient between actual and expected goals. However, this loses important information on the accuracy of each indiviudal shot. For this reason, we use the Root Mean Square Error (RMSE) of the validation dataset, defined by
RMSE=√∑ni(xGi−Gi)2nRMSE=∑in(xGi−Gi)2n
where xG is the prediction by the model (a probability from 0 to 1) for the shot labelled by the index i, and G is the true outcome (0 for a non-goal, 1 for a goal). A lower RMSE indicates better performance.

Hmm...

Happy to say that's beyond me, but is that a snake oil salesman trying to blind me with science?

I note that the article answers some of the doubts I raised:

"One unique aspect of our AI model is that also it takes into account the 2 actions preceding the shot ". But how exactly are they factored in?

Far from being a Luddite, I welcome technological advances. I applaud the intentions of xG, but I'm not convinced that it tells us more than we can see with our own eyes.
 
I've already highlighted just some of the limitations of data analysis in general and specifically with the xG model.

You've pretty much repeated everything I said, the nuance you're missing is Pebble's belief in conspiracy and mistrust of anything requiring interpretation.
I'm not missing it, just pointing out obliquely that statistics should not be used by the unqualified without the usage of the correct protective equipment!
 
I'm not missing it, just pointing out obliquely that statistics should not be used by the unqualified without the usage of the correct protective equipment!
But you're both missing the point.

I don't 'mistrust...anything requiring interpretation'.

I challenge interpretations which admit to being grossly inaccurate (the 'expected' points table for Championship sides after 11 games, allegedly based on statistical analysis of those games, which apart from anything else has Bristol City 23rd without a win when in reality they only have one defeat and have somehow managed to amass FIVE wins!)