Thursday, 6 August 2015

How to compare NBA players (improved analysis)

(Originally posted on Friday, 22 July 2016)

This is an altered version of my previous post on this topic. The main difference is that I calculate values of three different types: offensive, defensive and negative. Offensive values are for points, assists, steals and half of the overall value of rebounds. Defensive values are for blocks, steals and half of the overall value of rebounds. Negative value is only for turnovers.

The second main difference is the fact that my improved calculations increased the value of an assist (it was definitely too low) and significantly decreased the value of a point scored (the value that makes different statistics comparable).

The values of all the statistics can be applied directly to any game as a cool way to verify their correctness. The total offensive values are a very good approximation of the actual box-score points scored in tight average games. The difference between total defensive values of opposing teams as well as the difference between total negative values of opposing teams additionally explain differences in points scored.

In this post:
How to compare NBA scorers (improved analysis)
I described a way to compare NBA scorers. It was meant as a first step to compare NBA players, using also other statistics, but my improved analysis shows that the positive values of high % scorers are totally virtual (a good shooter is already credited for his extra points in the box-score) and the negative values of low % scorers are not so negative after all (their coaches for some reasons keep them quite long on the floor, so maybe their teammates would be even worse for the team if they had to play longer minutes or to take more shots than they actually do). Lastly, weak shooters are simply punishing themselves by not scoring some points that better shooters would score. Punishing them more because of their poor shooting would be an overkill.

The analysis in this post (unlike in the above mentioned post) is based on one CRUCIAL thing. All kinds of players (scorers, passers and rebounders) have to do their job to achieve, as a team, an average result. All of them. So, for example, we CANNOT credit a point guard for 2 points per 1 assist, because a scorer also had to do his job – there would be NO assist if the scorer missed a wide open shot or a dunk. And it works the other way round too – not all of the points should be credited to the actual scorers – basketball is a team sport after all. The question is how all the statistics should be compared to each other.


All the statistical data I used I found on this site:
http://www.basketball-reference.com/

I based some of my calculations on the league-wide team-averages of TOTAL numbers that I found on 40 such sites:
http://www.basketball-reference.com/leagues/NBA_2016.html

To be precise: from each such page I took the line “League Average” and summed them all.


REBOUNDS

I started with the value of a rebound and it turns out that it's a VERY solid foundation on which some other statistical values may be based.

I decided to start with the most extreme example – one team grabs EVERY rebound. To make the analysis easier I assumed that all the players have 2P% of 50.0% and they attempt only 2P shots. It goes like this:
1. Team-1 – a FG made – 2 points.
2. Team-2 – a FG made – 2 points.
3. Team-1 – a FG missed – 1 rebound (offensive) for Team-1.
4. Team-1 – a FG made – 2 points.
5. Team-2 – a FG missed – 1 rebound (defensive) for Team-1.
6. Team-1 – a FG missed – 1 rebound (offensive) for Team-1.

This is the whole sequence that is repeated a number of times in the whole game – the next 2 shots would be made (one shot by each team) starting with Team-1, exactly as it was at the start of the sequence.

The sums for the sequence are:
Team-1: 4 points and 3 rebounds (2 offensive and 1 defensive).
Team-2: 2 points and 0 rebounds.

Let's put this into a game perspective. How many possessions are there during the whole game? For a team 100 possessions would mean 100 points (100 shot attempts with 2P% of 50.0% multiplied by 2 points per shot made). So 200 possessions per game is a good approximation. But there are 6 possessions in the sequence and 200 can't be dived by 6 without any fraction. For this very reason I assumed 204 possessions per game (204/6 = 34 sequences). In a tied game there would be 102 possessions (102 points) for each team.

The sums for the whole game would be:
Team-1: 136 points and 102 rebounds (68 offensive and 34 defensive).
Team-2: 68 points and 0 rebounds.
Differences: 68 points and 102 rebounds.

Rebound's overall value (imprecise): 2/3 (68/102).
Rebound's offensive value (imprecise): 1/3 ((136-102)/102 = 34/102). Rebound's defensive value (imprecise): 1/3 ((102-68)/102 = 34/102).

Calculating values for offensive and defensive rebounds gives different values in different examples BUT calculating a value for ANY rebound gives ALWAYS the same value! The value depends only on the assumed FG%.

To show you that this is true I prepared another example. I assumed that all the players have 2P% of 50.0% and they attempt only 2P shots (similarly to the first example), but Team-2 grabs half of the defensive rebounds (and still no offensive rebounds). It goes like this:
1. Team-1 – a FG made – 2 points.
2. Team-2 – a FG made – 2 points.
3. Team-1 – a FG missed – 1 rebound (defensive) for Team-2.
4. Team-2 – a FG missed – 1 rebound (defensive) for Team-1.
5. Team-1 – a FG made – 2 points.
6. Team-2 – a FG made – 2 points.
7. Team-1 – a FG missed – 1 rebound (offensive) for Team-1.
8. Team-1 – a FG made – 2 points.
9. Team-2 – a FG missed – 1 rebound (defensive) for Team-1.
10. Team-1 – a FG missed – 1 rebound (defensive) for Team-2.
11. Team-2 – a FG made – 2 points.
12. Team-1 – a FG made – 2 points.
13. Team-2 – a FG missed – 1 rebound (defensive) for Team-1.
14. Team-1 – a FG missed – 1 rebound (offensive) for Team-1.

This is the whole sequence that is repeated a number of times in the whole game – the next 2 shots would be made (one shot by each team) starting with Team-1 AND after the next missed shot by Team-1 the ball would go to the Team-2, exactly as it was at the start of the sequence.

The sums for the sequence are:
Team-1: 8 points and 5 rebounds (2 offensive and 3 defensive).
Team-2: 6 points and 2 rebounds.

Let's put this into a game perspective. There are 14 possessions in the sequence and 200 can't be dived by 14 without any fraction. For this very reason I assumed 196 possessions per game (196/14 = 14 sequences). In a tied game there would be 98 possessions (98 points) for each team.

The sums for the whole game would be:
Team-1: 112 points and 70 rebounds (28 offensive and 42 defensive).
Team-2: 84 points and 28 rebounds (0 offensive and 28 defensive).
Differences: 28 points and 42 rebounds.

Rebound's overall value (imprecise): 2/3 (28/42).
Rebound's offensive value (imprecise): 1/3 ((112-98)/42 = 14/42).
Rebound's defensive value (imprecise): 1/3 ((96-84)/42 = 14/42).

As you can see the values (imprecise) are the same as before. Some of the rebounds cancel each other out, but what is left explains the point difference perfectly (at least in the examples). But the examples assumed 2P% of 50.0 % and the league-wide average 2P% from the last 40 years was 48.3 %.

When both teams shoot with higher percentage then the values of a rebound are higher and when both teams shoot with lower percentage then the values a rebound are lower. If both teams would miss all their shots then the value of a rebound would be ZERO!

Let's divide the overall value of a rebound from the examples above (2/3) by the assumed 2P% of 50.0 % = 2/3/0.5 = 4/3 = 1.333(3) = 133.333(3) %. It means that for the assumed 2P% of 50.0 % the value of the rebound is 33.333(3) % higher than the assumed 2P%. So the value of the difference between the assumed 2P% (0.5) and the average league-wide 2P% from the last 40 years (0.483) was worth: (0.5 – 0.483) * 4/3 = 0.0226(6). We have to subtract this value from the value of the rebound from the examples (2/3) = 0.6666(6) – 0.0226(6) = 0.644.

BUT in the examples above there was another assumption: all the field goals were worth only 2 points. In reality some of the field goals are 3-pointers. In the last 40 years exactly 9.86 % of the field goals were 3-pointers, so the average value of a field goal is 2.0986 (0.0986 * 3 + 0.9014 * 2). This value (2.0986) is 4.93% higher than the one (2.0) I assumed in the above examples (field goals only from 2P shots).

BUT (again) if we take 3-pointers into consideration then we have to take the FG% instead of 2P%. The the average FG% in the last 40 years was 46.5 %. So the value of the difference between the assumed 2P% (0.5) and the average league-wide FG% from the last 40 years (0.465) was worth: (0.5 – 0.465) * 4/3 = 0.0466(6). We have to subtract this value from the value of the rebound from the examples (2/3) = 0.6666(6) – 0.0466(6) = 0.62.

So the value of a rebound with 3-pointers involved is 0.62 * 1.0493 = 0.65 (a little higher than 0.644 calculated for 2P shots only). Half of the value is the offensive value and half of it is the defensive value.

TRB_O (before fine-tuning) = 0.325 * TRB
TRB_D (before fine-tuning) = 0.325 * TRB

I don't like such precise values and I will fine-tune them at the end to make them easier to remember and use, but for the precise calculations of other values I will use these rebound-values.

There is one VERY important thing left to analyse for rebounds: what the rebound-value means in a tied game? In the examples above the rebounds explained both positive extra points for Team-1 and negative extra points for Team-2. Half of the rebound value was “offensive” and half was was “defensive”. In the second example some defensive values cancelled out some offensive values, but their value stayed the same. So, in a tied game ALL the offensive values are cancelled out by ALL the defensive values, but their values should stay the same too (as the two examples showed).

In a tied game with 100 points and 50 rebounds for each team the offensive value of 50 rebounds would be 16.25 (0.325 * 50). It means that on average 16.25 points (out of 100 scored) were earned by sheer rebounding, so they should be credited to players with rebounds, NOT to the actual scorers!

With only rebounds and points (and no assists and no other statistics) the scorers should be credited with 83.75 points (100 – 16.25), which is 83.75 % of their scored points. But other statistics (most importantly assists and steals) do influence scoring, so the value of a point scored is LESS than 0.8375 point. The question is how much less. It will be described toward the end.


ASSISTS

You can't make an example for assists similar to the example for rebounds, so I was left only with common sense and the actual numbers from the last 40 years. The total number were:
a) field goals made: 127185,
b) field goals attempted: 273321,
c) assists: 75567.

The average FG% was: 127185/273321 = 46.5 % (as I wrote before). However an assist means 1 FG made on 1 FG attempted. So the without assists the numbers will be much lower:
a) field goals made without an assist: 51618 (127185 – 75567),
b) field goals attempted without an assist: 197754 (273321 – 75567).

The average FG% for field goals without as assist was: 26.1 %. ONLY 26.1 %!!! What does it mean? It means that team-play is much better than individual-play. Pretty obvious, isn't it?

Individual play means that scorers can score all by themselves, although with very low FG%. So, an assists means that a shot was MORE probable (100.0 % – 26.1 %) = 73.9 %. But this added probability was thanks to team-play not to an assist alone.

Team-play means that passers and scorers complement each other. Even after a VERY good pass there would be NO assist if the scorer missed a wide open shot or a dunk (such a situation falls into the category of FG% without an assist). So the added probability of a field goal (with an assist) means that BOTH the passer and the scorer did their job.

Whose job is easier? On average the scorer's job is easier, BUT good scorers make the point-guard's job easier too. A particular point-guard would have less assists when playing with weak scorers than when playing with good scorers. I assumed that 70.0 % of the credit for a field goal after an assist should go to the passer and 30.0 % of the credit should go to the scorer. This is very subjective, but I HAVE to make an assumption to be able to calculate things.

The average value of a field goal is 2.0986, but we have to remember that 16.25 % of all the points should be credited to the rebounders. We end up with this calculations: 0.739 * 0.7 * 2.0986 * (1 – 0.1625) = 0.909.

AST_O (before fine-tuning) = 0.909 * AST


BLOCKS

A block stops a shot completely, so it negates the average field-goal value of not-blocked shots (a block is counted as a missed shot toward the shooter). In the last 40 years there were 127185 FG made, 273321 FG attempted and 16534 blocks. So the average FG% for non-blocked shots was: 127185 / (273321 – 16534) = 49.5 %.

The average value of a field goal is 2.0986, so the average value of not-blocked field goal is 2.0986 * 0.495 = 1.04, but blocks hardly any occur against 3-point shots, so I can safely round it down to 1.00. The same analysis for 2P-shots only gives the block value of 0.99.

BLK_D = 1 * BLK


TURNOVERS (negative) and STEALS (defensive)

Similar analysis to blocks, but with normal FG average: 2.0986 * 0.465 = 0.976. This value is lost by a turnover, however some turnovers (51.0%) transform into steals. The question is how much should we credit the stealer and how much should we punish the player who committed the turnover?

Once again I have to point out that all the players should do their job and their job is also to play defence. While playing defence they should be aware what is going on around them and they should steal weak passes. But there are some steals that should be fully credited to the stealer, because they played some VERY good defence. But how many such above-average steals are there?

I assume that 35.0 % of steals should be credited to the stealer (VERY good defence) and 65.0 % should be “credited” to the player with the turnover. But there are also turnovers without steals (49.0 %), so they should be fully “credited” to the player with the turnover: -0.976 * (0.51 * 0.65 + 0.49 * 1) = -0.802. For the steal it (positive defensive value) would be: 0.976 * 0.35 = 0.342.

TOV_N (before fine-tuning) = -0.802 * TOV
STL_D (before fine-tuning) = 0.342 * STL


STEALS (offensive)

The offensive value of a steal means “easy points coming from steals” (the added probability of a field goal) – there are some fast-breaks from steals that end with an easy field goal. But not every such fast-break ends with a field-goal. And not every steal results in a fast-break.

On one hand a steal is not as valuable offensively as an assist because it doesn't make the shot 100 % sure, BUT a steal means that there was no need for a defensive rebound to get the possession, so those 16.25 % should NOT be credited to the rebounder. A change in possession is connected with defence (already described), so those 16.25 % should NOT be credited to the stealer either.

Let's assume that 66.6(6) % (2/3) of steals end with very easy points that make everybody's job MUCH easier (it's impossible to judge how many assists come during fast-breaks, so I ignore this issue completely). The average FG% is 46.5 % and the average value of a field goal is 2.0986, so the calculations are: 2/3 * (1 – 0.465) * 2.0986 = 0.749

STL_O (before fine-tuning) = 0.749 * STL


PERSONAL FOULS

I don't value personal fouls. Why? Not every foul ends with a free throw and some of the fouls that end with free throws are GOOD because they were committed either on a weak FT shooter or to prevent easy points. And the offensive value of a FT (for the opposing team) is reflected in points scored (for the opposing team).

PF_D = 0


POINTS

Knowing the value of a rebound, the value of an assist and the value of a streal I can calculate how much is left for the points.

For points after an assist the calculations are:
(1 – 0.1625) * 2.0986 * 0.261 + (1 – 0.1625) * (1 – 0.261) * 2.0986 * 0.3 = 0.848.
The first part refers to the part of a point that scorers are able to score all by themselves, although with a very low percentage of 26.1 % (the credit for 16.25 % of the average FG value with such a FG% goes to the rebounder and what is left should be credited to the scorer) and the second part refers to the added probability of a field goal (the credit for 16.25 % of the added probability of the average FG value with such a FG% goes to the rebounder, 70.0 % of what is left goes to the assist-maker and 30.0 % goes to the scorer).

For points from a steal the calculations are:
0.465 * 2.0986 + 1/3 * (1 – 0.465) * 2.0986= 1.350.
The first part refers to the part of a point that the scorers are able to score with all the playars doing their average job and the second part refers to the added probability of a field goal (the credit for 2/3 of the added probability goes to the stealer). A steal means that there was no need for a defensive rebound and this is why no rebounder should be credited in such situations.

For points without an assist nor steal the calculations are:
2.0986 * (1 – 0.1625) = 1.758.
The credit for 16.25 % of the average FG value with the average FG% goes to the rebounder and the rest goes to the scorer.

These 3 kinds of situations happen in different numbers, so I had to calculate a weighted-average of the above values using total numbers from the last 40 years. The result is 1.128. This is the average value of a FIELD GOAL that should be credited to the scorer. The average field goal value is 2.0986, so 53.75 % (1.128 / 2.0986) of the points scored from field goals should be credited to the scorers.

As for free throws I decided that I will credit all of them to the scorers. Yes, some of the fouls occur during team-play, but many of them are committed early in the play and are the fault of the defender rather than the result of good team-play. Moreover some of fouls are tactical – toward the end of the game or against a weak free throw shooter. Finally, my values for rebounds were calculated for field goals only, so they may be not correct for the part of the points from free throws (in the actions ending up in at least two free throws there is no FG%). For these very reasons I will credit all the points from free throws to the scorers.

PTS_O (before fine-tuning) = 0.5375 * (PTS – FT) + 1 * FT


FINE-TUNING

I used the offensive values calculated above (for rebounds, assists, steals and points) and checked the total team-average number of points from the last 40 years (using the data obtained the way I described at the beginning). The number of total team-average points I calculated was 341676.32 and the actual number of total team-average points scored was 331374. The difference was 10302.32, so the error was ONLY 3.1 %!

To be honest I was glad that there was some error because it allowed me to find a reason to fine-tune the too-precise values.

My FINAL (fine-tuned) values are:

PTS_O = 0.5 * (PTS – FT) + 1 * FT
TRB_O = 0.33 * TRB
TRB_D = 0.33 * TRB
AST_O = 0.9 * AST
STL_O = 0.75 * STL
STL_D = 0.33 * STL
BLK_D = 1 * BLK
TOV_N = -0.8 * TOV

The number of total team-average points I calculated using my final offensive values (for points, rebounds, assists and steals) was 331707.83 and the actual number of total team-average points scored was 331374. The difference was 333.83, so the error was ONLY 0.1 %!!!

The total negative value of turnovers was -41757.60 and the total positive defensive value of rebounds, blocks and steals was 71124.25. Such values, on average, cancel each other out, but in a particular game the difference between these values calculated separately for the opposing teams should explain, to some extent, the actual point difference between the teams.

The average total values of a team per game were:
Total offensive values: 102.6
Total negative values: -12.9
Total defensive values: 22.0
Total overall values: 111.7


VERIFICATION

To verify my values I used the last 3 NBA games (games 5, 6 and 7 of the 2016 NBA finals), my favourite NBA game ever – the Memorial Day Miracle and the game 7 of the 2013 NBA finals (the win by the Miami Heat over the San Antonio Spurs):
http://www.basketball-reference.com/boxscores/201606130GSW.html
http://www.basketball-reference.com/boxscores/201606160CLE.html
http://www.basketball-reference.com/boxscores/201606190GSW.html
http://www.basketball-reference.com/boxscores/199905310SAS.html
http://www.basketball-reference.com/boxscores/201306200MIA.html

The numbers mostly speak for themselves, but please remember that some games are played MUCH differently than an average game - it's only the DIFFERENCE between overall values that counts. The differences between the values are given in the brackets.

Game 1:
Teams: Cleveland Cavaliers at Golden State Warriors
Result: 112 – 97 [15]
Total offensive values: 98.28 – 92.89 [5.39]
Total negative values: -12.80 – -13.60 [0.8]
Total defensive values: 26.16 – 25.17 [0.99]
Total overall values: 111.64 – 104.46 [7.18]

Good enough approximation. Please notice that in this game both LeBron James and Kyrie Irving scored 41 points, but LeBron James was MUCH better (44.5 overall against 28.84 overall).


Game 2:
Teams: Golden State Warriors at Cleveland Cavaliers
Result: 101 – 115 [-14]
Total offensive values: 92.90 – 115.45 [-22.55]
Total negative values: -11.20 – -8.00 [-3.20]
Total defensive values: 16.20 – 25.81 [-9.61]
Total overall values: 97.90 – 133.26 [-35.36]

Weak approximation, but it is explained by the fact that one team played significantly below average and the other team played significantly above average. My values are calculated for the average game, so when the actual numbers for the opposing teams go in the opposite directions from the average numbers, they get multiplied. It seems that the bigger the difference the higher the multiplier.


Game 3:
Teams: Cleveland Cavaliers at Golden State Warriors
Result: 93 – 89 [4]
Total offensive values: 93.39 – 87.42 [5.97]
Total negative values: -8.80 – -8.00 [-0.8]
Total defensive values: 24.15 – 20.18 [3.97]
Total overall values: 108.74 – 99.60 [9.14]

Very good approximation.


Game 4:
Teams: Portland Trail Blazers at San Antonio Spurs
Result: 85 – 86 [-1]
Total offensive values: 90.31 – 88.90 [1.41]
Total negative values: -12.00 – -12.80 [0.80]
Total defensive values: 20.49 – 22.86 [-2.37]
Total overall values: 98.80 – 98.96 [-0.16]

Almost perfect approximation. Cool.


Game 5:
Teams: San Antonio Spurs at Miami Heat
Result: 88 – 95 [-7]
Total offensive values: 88.89 – 85.79 [3.10]
Total negative values: -11.20 – -12.80 [1.60]
Total defensive values: 21.15 – 20.83 [0.32]
Total overall values: 98.84 – 93.82 [5.02]

Strange approximation – pointing at the wrong team. But the absolute error is acceptable.

No comments:

Post a Comment