The Problem With Hockey-Reference’s Adjusted Scoring

Hockey-Reference is a great resource for historical NHL data and quick summary statistics for players. The Play Index and detailed player splits are very useful, and the site has also added the kind of summary stats (particularly scoring and save percentage broken down by special teams) that once upon a time I used to go to NHL.com for before SAP ruined its usability.

One thing Hockey-Reference has not been as successful with is developing other stats, like player value metrics. Most people rightly see things like Point Shares and Goals Created as not adding a lot of value. However, one stat frequently cited and largely taken at face value within the analytics community is Adjusted Points (or Adjusted Goals, depending on what type of player is being talked about). I am certainly no exception, having made a case based on adjusted scoring in my last post.

I have heard it claimed that adjusted scoring was biased towards or against certain eras, but hadn’t ever looked at it myself in much detail until a number of articles started popping up suggesting that Alex Ovechkin might be the greatest goal scorer of all-time. Some of the exact same arguments have now been flaring up again in the wake of Ovechkin scoring his 500th career goal this past weekend.

What I found particularly interesting was that the cases being made were very heavily based on adjusted scoring. If there was indeed some statistical bias impacting that metric, it would likely affect any conclusions being made about Ovechkin’s potential GOAT scoring status, as well as any other cases made for or against other players based on their adjusted numbers. In this post, I’m going to outline the problems with the Hockey-Reference version of adjusted scoring, and then propose an alternate method.  In a future post, I’ll look at how my method stacks up in terms of evaluating historical players.

Any adjusted scoring discussion has to begin with the league-wide average number of goals per game. Obviously as the number of goals scored increases, so does the average number of points recorded every night. The problem is that overall scoring is still just one part of an ideal equation. For any given player, we are not just concerned about the size of the pie in terms of league goals for, but how big the slice should be for each goal scorer individually. In the early years of the league, skaters would routinely play most of the game, and therefore it’s not surprising that they would end up with different offensive statistics than a top scorer in today’s game where coaches distribute ice time across four lines. Another important factor has been changing schedule lengths throughout league history, from 50 games in the WWII era to today’s 82, with a couple of lockout-shortened campaigns along the way.

To their credit, Hockey-Reference appears to have anticipated these issues when designing their adjusted points formula. The three adjustment factors are: Schedule adjustment (normalizing for the number of games scheduled in a season), roster size adjustment (maximum roster size divided by 18, to adjust for the differing number of skaters dressed throughout history), and the era adjustment, based on league scoring averages.

Of those three, the first and third are reasonable and would be almost universally standard in any adjustment method. It’s the second one that causes me to have serious reservations.  The basic concept is sound, we want to not only take into account overall league scoring but also how that scoring is distributed across the lineup, but roster size seems to be a very crude proxy for how difficult it is for individual players to score points. It should be obvious that NHL coaches do not operate like minor novice house league coaches, which is to say that they don’t divide ice time evenly among everyone in the lineup.

For example, it is routine at IIHF events to permit a team to dress 20 skaters, but Team Canada almost always uses the 13th forward and 7th defenceman sparingly. In the 2010 Olympics, Patrice Bergeron was mainly used to take faceoffs in the defensive zone, and in Sochi in 2014 Dan Hamhuis occasionally spotted one of the top three left-handed defencemen for a shift here and there. It would be a stretch to suggest that the presence of Bergeron or Hamhuis in the lineup in any way cost someone like Sidney Crosby or Drew Doughty the opportunity to score points, but the Hockey-Reference version of adjusted Olympic scoring would be forced to give both of them an 11% boost in their stats because Bergeron and Hamhuis spent most of their time sitting on the bench rather than in the press box. Similarly, if we look at historical scoring numbers, there are periods where offence seems to be more evenly split across the entire roster and other segments when the top players generated a higher share. The more important factor for a top line player’s ability to score would seem to be their situational usage rather than how many teammates happened to be in the lineup.

Leaving aside those philosophical objections for the moment, is there any evidence that the roster size adjustment actually works? If so, we should expect to see a roughly even spread of elite seasons across all scoring levels through league history. I pulled the top 60 individual seasons from Hockey Reference’s list of all-time top single seasons for adjusted goals (not including any players from the 2015-16 since it is not yet complete) and sorted them by decade through 2014-15. Here’s the breakdown:

Hockey Reference Top 60 Adjusted Goal Seasons by Year

Decade StartDecade EndTop 60 Seasons
1915-161924-252
1925-261934-3515
1935-361944-450
1945-461954-551
1955-561964-651
1965-661974-756
1975-761984-856
1985-861994-959
1995-962004-0512
2005-062014-158

I think that’s enough on its own to demonstrate that Hockey-Reference does not equally balance all eras, and that the roster size adjustment is in fact the culprit. The lack of Original Six skaters on the list is a glaring problem. Having not even one top 60 season from the mid-thirties to the mid-forties is particularly interesting since that period ends on one of the most legendary seasons in NHL history, Rocket Richard’s 50 goals in 50 games in 1944-45. Many have made the point that this scoring exploit was in fact very overrated because it came on a very strong team while playing in a watered-down league against opponents that had lost many of their best players to World War 2. That is no doubt true, but at the same time an all-time great with excellent teammates dominating weak opposition is exactly the kind of season we would expect to see ranked high on every adjusted scoring list. I would even go so far as to say that any adjusted goal-scoring list that does not have Maurice Richard’s 1944-45 near the very top could reasonably be determined to be flawed based on that fact alone (unless it somehow simultaneously accounts for league talent level). Not only does Richard not make Hockey Reference’s top 60, but he comes in at a lowly 119th place, tied with the peak seasons of players like Jeff Carter, Alexei Yashin, Marian Hossa, and Tony Amonte, a ranking that can’t fairly be described as anything other than preposterous.

While roster size has steadily increased, there have been quite a few variations over the years (Rauzulu’s Street has a useful summary), particularly during the ’50s and ’60s when the official rules kept changing, including some weird formats such as allowing home teams to dress an additional player. Leaving most of those quirks aside for simplicity’s sake, here’s the quick and dirty summary of the rule changes that served as milestones in seeing roster sizes increased to today’s standard:

1925-26: Rosters increased to 12
1929-30: Rosters increased to 15
1960-61: Rosters increased to 16
1971-72: Rosters increased to 17
1982-83: Rosters increased to 18

These rule changes map very closely with the underrepresented periods in the top goal scoring seasons, further confirming my suspicions about the roster size adjustment.

As previously mentioned, we would expect some periods to show up more often than others based on elite talent alone, but that does not account for the given distribution. Most of the players on the top 60 list are Hall of Famers, many of them all time greats. For example, between 1935 and 1985, only six guys show up on the list, all of them inner circle Hall of Famers (Howe, Beliveau, Hull, Esposito, Bossy, Gretzky), and more recently Alex Ovechkin has accounted for five of the eight seasons since 2005-06. Yet there are two brief periods in particular where this pattern is broken and a lot of different names all hit the list, including some that aren’t among the few dozen greatest of all time. The first is the previously discussed late ’20s/early ’30s, with players like Babe Dye, Aurele Joliat, Ace Bailey, Cooney Weiland and Hec Kilrea, and the second is from mid-’90s to the mid-’00s, which includes top 60 seasons by Alexander Mogilny, Peter Bondra, John LeClair and Milan Hejduk. This would tend to imply that the adjusted numbers from those periods aren’t on a level playing field with everything else.

The lack of representation from the 1980s is interesting since that was an extremely offensive era with two of the best ever in their primes. It seems natural to expect this decade to be leading the charge in any measure of offensive performance, yet from 1980 to 1989 the only seasons that made the top 60 adjusted goals list were by Gretzky (four times), Lemieux (two times), and two guys that played with Gretzky (Kurri and Nicholls once each). Both players saw their goal scoring numbers fall in the ’90s, Gretzky because of decline and Lemieux because of injury, even while more players were supposedly posting all-time great adjusted seasons than ever. It seems that not only is there a horrible bias against the Original Six era, but something appears skewed between the ’80s and ’90s as well.

Given that Hockey-Reference’s numbers seem problematic, what other options do we have? There are other ways of attempting to normalize scoring over time, it is just more difficult and time consuming to run through a bunch of calculations yourself rather than doing a quick search on an easy-to-use website.

The simplest method is to drop the roster size adjustment and just use league goals per game. However, then there is no attempt made to adjust for the distribution of scoring across the lines within a team.

Alternatively, some like to convert a player’s goals or points scored to a percentage of the league leader’s total (or, in case of a league leader, to the player in second place), to get a sense of their relative dominance. This calculation can be done relative to a different scoring rank, perhaps always against the #2 scorer in the league, or even taking a lower ordinal like 5th or 10th to reduce any impact of outliers.

I think there perhaps remains a more elegant solution that focuses more directly on what we actually want to measure, which is the share of team offence a top scorer would be expected to generate. We generally want to use adjusted points to try to level the playing field between stars of different eras and use that to determine who we think might have been better within the context that they played in. Therefore, I think the ability of top-end players to score points is the variable that should be used instead of a roster size adjustment to account for varying usage factors throughout history.

I have two suggestions, both based on the similar idea of looking at some selection of top players in the league and trying to figure out the level of offensive participation of that group, rather than relying on the scoring levels of the entire league. I think it’s preferable to look at a group average rather than any individual scoring rank to minimize variance, and I would prefer metrics that aren’t influenced by huge scoring outliers like Gretzky or Lemieux. As such, I decided to look at an average of 10 players per season, and selected them by choosing the players ranked 3rd through 12th in overall scoring (to take any generational talents out of the mix). This should hopefully provide a decent sample of the typical scoring environment for regular star players in the league for any given season.

Method 1 (Points per Game):

Find the total points scored by the players ranked 3rd through 12th in scoring. Divide by the total number of games played by their teams to calculate the average points per scheduled game of the group. Adjusted points can then be calculated using this points per game factor and schedule length.

Method 2 (Percentage of Team Scoring):

Find the total points scored by the players ranked 3rd through 12th in scoring.  Add up the total goals scored by the teams they played on (pro-rating based on games played for players who changed teams during the season).  Divide the two figures to determine a percentage of team offence for each season.  This percentage can then be used together with the overall league goals per game average and schedule length to determine adjusting scoring.

I decided to ignore the early years of the league and start with 1942-43 for my analysis, the first year of the Original Six. There are data issues with numbers from before this season, particularly concerning the number of assists awarded per goal, which I wanted to avoid to make the playing field as level as possible.

Here’s a chart with my two methods compared against Hockey-Reference’s, along with the league average goals per game (often used on its own for quick and dirty adjusted numbers). Each one is indexed to an average scoring context (defined by Hockey-Reference as 6 goals per game), with above one meaning offence was easier to come by (e.g. the wartime years or the 1980s), and below one meaning it was a more difficult time to score (like the 1950s or the current NHL).

Adj Scoring V2.0

The first obvious impression is that while both of my methods track very closely, Hockey Reference’s adjustments are way out of whack up until league expansion in 1968. Their metric follows the same pattern as the other two, the difference is that the roster size adjustment is artificially (and wrongly) reducing the perceived difficulty of scoring, creating the gap that causes them to underrate every scorer who ever played in the Original Six era.  Note also that even an adjustment based solely on overall goals per game would seem to underrate Original Six players, especially between WWII and the late ’50s.

As league rosters increase and the effect of the roster size adjustment correspondingly decreases, the four metrics not surprisingly start to converge. The only period where they all coincide nicely is the period starting when (surprise, surprise) that roster size adjustment finally goes away (1982-83) until 1991-92.

After 1991-92, Hockey-Reference’s adjustment (which becomes the same thing as the league goal per game method after roster sizes hit 18) starts to go the other way and exaggerate the difficulty of scoring, an effect that continues all the way through the 2013-14 season. Again, this indicates that adjusted numbers from this period are going to be overstated relative to the rest of league history.

The 1992-93 season, which is the biggest spike in the red and blue lines above, is an excellent example of how league scoring does not affect all players equally. Goals per game went from 3.48 to 3.63, but the average points per game of the #3 through #12 scorers in the league went from 1.31 to 1.51 (the highest mark ever). Not only were two extremely weak expansion teams added to the mix, but the NHL introduced four TV timeouts per period for the first time, which allowed coaches to keep their stars on the ice more often. Power play opportunities were also up relative to the typical levels of the late ’80s and early ’90s, which allowed for even more offence from the best players.

The transfer of offence from depth players to top line forwards can be illustrated by the changes in the percentage of team offence accounted for by scorers ranked 3rd through 12th in each year.

Percent of Team Offence

This is the reason that Hockey-Reference’s adjusted stats seem so favourable to players from the ’90s and ’00s, especially relative to players from the 1980s. Whether it was because of expansion, power plays, TV timeouts, or other factors, first liners have scored a higher percentage of their team’s points in the past 25 years than they did previously. Failing to account for this reality will naturally put players from different eras on an uneven playing field.

Given that my two methods led to such similar results, it would seem to make sense to pick one or the other when running the numbers on historical players, just to keep the analysis as clear and simple as possible. Conceptually, I like the percentage of team scoring idea more, because I think it more directly addresses the question being asked, which is what slice of the offensive pie is being allocated to the typical star forward within that season’s scoring context. It’s also more original, as far as I can tell, than points per game scoring. That said, I couldn’t fault someone for preferring the PPG method since it is an easier calculation and possibly a bit more intuitive. Either way, I think they both would end up at a similar destination, but I’ll proceed using the percentage of team scoring option.

One final test to run is to compare how my chosen method does across seasons, compared to Hockey-Reference.

Top 60 Adjusted Goal Seasons by Year (% of Team Method)

Decade StartDecade EndTop 60 Seasons
1944-451954-5513
1955-561964-658
1965-661974-7512
1975-761984-8512
1985-861994-955
1995-962004-054
2005-062014-156

While demonstrating that the method does not completely leave out multiple decades and is clearly more balanced, this chart does raise an alternate question of whether the percentage of scoring method might be overly penalizing more recent players to the benefit of past generations. I don’t think this is the case, I think it’s simply the reality that extreme seasons (e.g. one that puts a player in the top 60 all-time) tend to become less frequent over time as the depth of the talent pool and overall parity increase. To use an example from a different sport, half of the top 60 of baseball’s all-time best seasons in Wins Above Replacement came in 1946 or earlier. We should of course be aware of the impact of these factors on the overall numbers once we get to comparing current stars to guys who performed on black-and-white television sets, but it’s better to use a more accurate adjustment method and make any further corrections from there.

Now that I have created a model, next time I’ll use it to look at the question of whether Alex Ovechkin is the best goalscorer ever.

Leave a Reply

Your email address will not be published. Required fields are marked *