It all started with Moneyball.
Once the Oakland Athletics Baseball team used the Moneyball philosophy to win the American League West title in 2002 it started off the phenomenon in major American sports that is currently a little out of control, but will hopefully soon manage to level out to a point that works for everyone.
That is the ‘Advanced Stats’ phenomenon.
For those unaware, Moneyball was a book written by Michael Lewis (later made into a film starring Brad Pitt) about the General Manager of the Oakland Athletics baseball team, Billy Beane. The book tells the story of how Beane used advanced stats and what is known as Sabermetrics to take a club that had a significant disadvantage in terms of revenue, to compete with the world’s top teams in baseball.
In short, he used nothing but cold hard mathematics to make all of his decisions, going against the established tradition of using scouts, gut feelings and the like. It was thought to be a suicidal move by many as it was happening (as he was willingly losing ‘star’ players to get players whose numbers fit the system) but has since proven to be not only a revelation in Baseball, but in all the major American professional sports.
Ultimately, it’s something that should have been applied to sports far earlier. It makes complete sense to try and do things this way in sports, especially as they become more and more professional. But even more particularly in a game like Baseball.
Baseball has the least number of variables to consider, and it also has the largest sample number from which to gather its evidence.
Baseball is, far more than any other major American sport, a one on one contest, over and over again, throughout any given game. It is a pitcher throwing against a batter. Straight up.
Sure, there are plenty of minor variables, from the simple things like the altitude or size of a stadium that a match can be played in, all the way to how well the catcher can read what pitches should be thrown to any particular opponent.
But when you consider Basketball, Ice Hockey, and American Football in the same context (play to play) there is so much daylight between the simplicity of a single baseball ‘at bat’ and the variable complexity of any other play in these sports that it’s almost not worth comparing them.
Add to this the fact that each regular season, every team plays 162 games, compared to 82 in the NBA and NHL or 16 in the NFL and it leaves you with a lot of data, that’s easy to dissect and manage.
It’s almost offensive that statistics geeks didn’t get a look in sooner!
The problem comes when everyone wants in on the action, and they want that action now.
When I was a kid and following the NBA religiously, there were only so many categories of stats that were worth looking at. They could fit on the back of a Basketball card. There was –
GP – Games Played
MPG – Minutes Per Game
PPG – Points Per Game
APG – Assists Per Game
FG% – Field Goal Percentage
FT% – Free Throw Percentage
3P% – 3 Point Percentage
RPG – Rebounds Per Game
BPG – Blocks Per Game
SPG – Steals Per Game
They were 10 honest stats.
All pure statistically, based on completely simple mathematics, and someone recording the number of times a particular basic play was made.
You used them to prove to everyone that Michael Jordan was the best player in the game.
That John Stockton was a freak of vision on the court.
But basketball wanted a piece of the Moneyball pie, and so now we have more statistics than I can even comprehend.
To just scratch the surface of the current Advanced Stats being used in the NBA, there are now 12 “Advanced Statistics Categories” recorded for each individual in the NBA – just on the official NBA website.
Remembering this doesn’t include any of the basic statistics from when I was a kid. It also doesn’t include the other individual advanced statistics that are recorded by other companies and sports wesbites (which are quite literally so numerous I could never give you an accurate count).
This also doesn’t even go into all of the ways that these individual advanced stats are then used to create further advanced stats for teams, individuals and hypothetical scenarios.
Nor does it consider all of the advanced stats kept under categories like “player v player”, “team v team”, “team v player”, “lineup comparison”, the list goes on and on.
And you might be thinking ‘well… so what? Where’s the harm in that?’ Well you’re right in a way I suppose.
All that information is there so that someone somewhere can try to justify their analytics job. Or attempt to explain why they believe Player A is better than Player B. Or attempt to put down as an empirical fact, that which can’t really be put down as an empirical fact.
And that’s where I have a problem with it.
As previously stated, this stuff works in Baseball. There is enough sample data and a small enough list of variables, that there can truly be useful information in there to change and shape the future of the sport.
And the point of this isn’t to say that there isn’t useful information to be had somewhere amongst the forest of all of these statistical trees in the NBA, but there is an inherent flaw in the way that so many of these statistics are calculated that it’s beginning to mar the whole lot, and a large part of me wants to crawl into a hole and come out in 10 years’ time, when they may have a better handle on it all.
The problem that I’m seeing with a lot of this is that the people who have created a lot of these stats seem to be working backwards.
By that I mean that instead of using pure mathematics to quantify what may or may not be happening, they are taking what they believe to be true and trying to find an algorithm to prove what they think. This is basically the opposite of how science is meant to work. In fact it’s literally the opposite of the Scientific Method.
Take, for a nice simple example, the NBA statistic of “True Shooting Percentage”
Former ESPN analyst and stat man John Hollinger came up with this statistic as an attempt to determine who is the best true shooter in the game.
This is an advanced statistic that expands on field-goal percentage (FG%) to adjust for the fact that not every shot within the FG% is equal. It is used to gauge shooting efficiency and it takes into consideration points scored from three pointers, field goals and free-throws to get a measure of points scored each shooting attempt.
The technical calculation goes as such: Points / (2 * (FG Attempts + 0.44 * FT Attempts))
And that’s the problem. The value and/or difficulty of a 3 point shot or a Free throw are given arbitrary values that are simply decided by whoever comes up with the equation. It can be argued that the 3 point shot is given a fair ratio in this as it is valued equal to the points it is worth in comparison to the 2 point shot, but this argument falls over when you consider the 0.44 value that is attributed to free throws.
Others say that the 0.44 takes into account the fact that the Free throw is an open shot that you can set yourself for and is therefore a much easier shot, but this doesn’t take into account that a 3 pointer and a long 2 pointer are almost equally difficult.
You’re either basing it on points scored, or difficulty of shot. You can’t have it both ways. So the true shooting percentage simply can’t be trusted as a pure statistic.
Another big one is the Player Efficiency Rating (PER).
This one is described nice and vaguely as – the overall rating of a player’s per-minute statistical production. The league average is 15.00 every season.
At least in this one they are being honest and showing that this is a statistic that is worked backwards to make the league average 15. So it’s a formula that gets adjusted depending on how the league as a whole performs.
This one appears to have been invented to prove that the people that were thought to be the best players in the league in fact are the best. (However it appears to me to be an arbitrary statistic to put a number on that which can’t be measured.)
But then it started not to work out. Players that didn’t seem to be the 5th best player in the league had the 5th best rating, so adjustments were made to try and make the 5th best player in the league (roughly, obviously) come in with the 5th best PER. Then other statistics were invented like the ‘Value Added’ statistic which is “the estimated number of points a player adds to a team’s season total above what a ‘replacement player’ (for instance, the 12th man on the roster) would produce. Value Added = ([Minutes * (PER – PRL)] / 67). PRL (Position Replacement Level) = 11.5 for power forwards, 11.0 for point guards, 10.6 for centers, 10.5 for shooting guards and small forwards”
Jesus! If ever there was an arbitrary calculation…
But the fact is that when you take this statistic and apply it across the league, the best players (to the eye) are at the top. For now anyway.
But this flies directly in the face of the Sabermetrics ideals. That was pure mathematics that was showing that the so called star players, were not necessarily as valuable as others with a lower value. That’s what made it so ground breaking. These statistics seem to be trying to do the opposite.
The fact that so many of these statistics that have come through have had their formula’s adjusted since they were first introduced is just as concerning as the way they have been so wholly embraced as the be all and end all of measuring these athletes – as evidenced at the MIT Sloan Sports Analytics Conference this year and the blanket media coverage it got.
Look, don’t get me wrong, Advanced Statistics are most definitely the future, and it’s only a matter of time before they start to permeate through to all of the world’s sports in the way that the US has been inundated. But working out which ones are valuable and which ones are useless is a long process and NBA GM’s would be wise to play with caution when it comes to these stats while they are still in the womb and use them only in moderation until said future actually arrives.