One hundred years of NHL hockey; some analysis

This post has been updated, with corrected data and modified discussion, as detailed in the text.

Does anything say “100 Years of the National Hockey League” like say, a Tampa Bay vs Vegas matchup.  Montreal, Toronto, Ottawa?  Please; bunch of party crashers them.

In case you missed it, National Hockey League play is, today, exactly 100 years old. On December 19, 1917 the first two games in the new league had the Toronto Arenas at the Montreal Wanderers, and the Montreal Canadiens at the Ottawa Senators. This limited slate was due in large part to those being the only four teams in the league.  It turns out that the Wanderers got their first and only win in franchise history, which lasted just six games. They got past the Arenas in the common hockey score of 10-9. The Arenas, conversely, went on–along with the Canadiens–to become one of the two most storied franchises in NHL history: today’s Toronto Maple Leafs. The Senators’ first incarnation lasted until 1934, and after a 58 year absence came the second (and current) version in 1992.

So anyway, there’s hype and hoopla happening, and also discussions of the greatest seasons, teams, players, etc. As for me, I thought it would be great fun to crunch 90 years of team-season numbers to see what they indicated about team records, actual versus expected. Two minutes for tripping, and without even inhaling anything.

Specifically, I’m interested in the topic of teams over- or under-playing their expectations (as discussed in detail here, last week), more specifically, for all NHL seasons from 1927 on (when the NHL first went to a 40+ game schedule), along with the short but interesting run of the World Hockey Association from 1973 to 1979. The base data for all these seasons is available at Hockey Reference, but a fair bit of manipulation is needed to extract and compile a list containing just the teams, locations and scores. Also, in 2005/2006 the NHL changed 80+ years of history, doing away with all ties via the “shootout” system, and you have to remove that effect.

Previous to that, in 1983, the five minute (maximum) OT period had been instituted. For comparing team scoring and records over time, that’s partly just nuisance (scores of OT games ending as ties are easily standardized by dividing by 1.083, i.e. 60/65). But it’s also partly analytical problem: OT is sudden death, and so you need the time at which a winning goal was scored to make the correction. But even if you have it, time-to-nearest-event (assuming a random process), follows a gamma distribution whereas regulation time scoring, which is instead defined by the sampling frame (i.e. time), will follow a Poisson. And rates estimated from gamma-distributed data, at low sample sizes, are always biased low, strongly so.

The “shootout” system however, is another analytical matter entirely. It’s not just “more hockey”–it’s not in fact hockey at all. For 65 minutes teams play the game of hockey, to a draw…and then three players from each team see if they can beat the opposing goalie, one on one, to decide who is the “winner”. You can’t standardize that out–it’s not the same beast. The cleanest solution is to just analyze games scores at the end of regulation (60 minutes), allowing for ties, which I did. There is a clear analytical approach to take for such questions.

Obviously, one must first define just what an “expected” record even means. Generally (and here) it’s defined as the expected W/L/T record, given a team’s seasonal goals scored, and allowed. This concept is most prevalent in baseball, where the so-called “Pythagorean Expectation” uses just the per-game mean values to do so. As discussed in the previous post, this is always done via fitting ad-hoc models to data, which can potentially introduce serious conceptual error.  If one wants to base the analysis on mean values alone, the Poisson model is the clear and obvious one to use, assuming the seasonal distributions (scored and allowed) actually conform thereto, which must be evaluated.  If both runs scored and allowed do so, then the Skellam distribution directly predicts the expected frequencies of differences between them, which gives a team’s expected record. However, if either one does not conform, you then have no theoretical basis for record estimation, and the only way to proceed is by repeatedly re-sampling from the two actual distributions. The only assumption required is then that the two distributions are independent of each other. This approach is perfectly valid, just more computer-intensive: if the data do in fact conform to a Poisson distribution, the method will give about the same result as will the Skellam distribution, assuming sufficient trials.

A few statistical methods notes here. I originally tried testing conformity of each team’s goals scored (and allowed) distributions against Poisson expectation using a Pearson chi-squared goodness-of-fit test. Should have known better!  That test is well known to be sensitive to the number of infrequent observations in the data (which there always are in data like these), a problem which is “solved” by various ad-hoc methods involving dropping observations from the data. Each such can well give a different test result…which in fact makes the whole approach an unacceptable non-solution.  Indeed when I tried several such, each gave an entirely different result. So, I instead tested significance by creating a reference distribution by simulating the distribution of Poisson joint probabilities (i.e. likelihood functions), based on observed mean values of goals scored and allowed, over 1000 seasons. If goal scoring and allowing are truly random (Poisson) processes, these probabilities should then distribute uniformly (evenly) across [0,1], a conformity which is eminently testable (via the Kolmogorov-Smirnov (K-S) test for continuous distributions).

Okay, so onto the results.

Below are the probability distributions for goals scored and allowed, respectively, over all 1514 NHL/WHA full team-seasons from 1927 to 2017. Note that I included playoff games in this analysis (mainly because it was a pain to try to separate the regular season from playoff games in the Hockey Reference database I scraped the data from).

Runs scored full season p distribs

Re-sampling based distribution of the probability of the observed distributions of goals scored exceeding Poissonian expectations, across 1514 NHL and WHA seasons, 1927-2017

Runs allowed full season p distribs

Re-sampling based distribution of the probability of the observed distributions of goals allowed exceeding Poissonian expectations, across 1514 NHL and WHA seasons, 1927-2017

A K-S test for conformity to a uniform distribution, in both cases, equals 1.0: they conform, and therefore there is exceedingly strong evidence for goals scored and allowed each following a random (Poissonian) process, over NHL/WHA history from 1927 on (1514 team-seasons).  Note however that for goals scored (top figure) there may well be some non-randomness operating in the 1% of observations at the very left of the histogram.  That is, there may have been a few teams whose goals scored deviated from purely random expectation.  Nevertheless, the K-S test tests conformity across the entire range of possible values, hence returning a probability of 1.0.

The probabilities above are two-tailed, and hence they do not discriminate between an over- vs under-dispersed departure from a Poisson expectation.  If the data truly follow a Poisson process, which is probabilistic (stochastic), then departures from perfect conformity to the Poisson should be equally over-, and under-, dispersed, i.e. occurring in equal frequencies and magnitudes.  Over-dispersion manifests as variances higher than predicted by the Poisson (i.e. more very high, and low, scoring games, than predicted), whereas under-dispersion manifests as lower variances (more scores close to the mean), than expected.  The p values shown above cannot discriminate between these two conditions, so I also compute the variance to mean ratio for each team-season.  These should vary stochastically around the value of 1.0 (and hence the logarithms thereof should vary symmetrically around 0.0).

The results (goals scored):

Var.to.mean distr runs scored full season

Natural logarithm (base e) of variance to mean ratio for goals scored across 1514 NHL/WHA team-seasons from 1927-2017

And for goals allowed:

Var.to.mean distr runs allowed full season

Natural logarithm (base e) of variance to mean ratio for goals allowed across 1514 NHL/WHA team-seasons from 1927-2017

I could test those two for conformity to a normal distribution, but just eyeballing it, it’s clearly pretty normal and I highly doubt that it deviates therefrom significantly.  Combined with the p value distributions from the first two histograms, and the K-S test results, the results collectively and strongly indicate that NHL/WHA teams both score and allow goals according to a random dispersion around the mean.  The Skellam distribution will therefore predict well the expected W/L/T records across NHL/WHA history from 1927 on.

But I didn’t know beforehand that I would get these results, and so to evaluate departures of W/L/T records from expectation I had just gone ahead with the re-sampling approach, using 1000 trials for each team-season (this by the way took just five to ten minutes computing time on a fairly fast desktop).  I normalized all results to a per-game rate (to account for NHL seasons increasing from 44 to 82 games from 1927 to 2017), and then sorted the results by team and season, to see which team-seasons deviated most strongly (positively and negatively) from Poissonian expectations.  There are two possible ways deviations can be computed, and I give both. A team may (1) have more wins than expected, at the expense of fewer total ties plus losses, or (2) have more combined wins and ties, at the expense of fewer losses.  Teams translating fewer losses directly into more wins, with stable tie numbers should show up in either case. The same procedure is also applied to teams who under-perform, and I also analyzed every team’s half-seasons of home, and road, games.

Alrighty then…just who were the big over-/under-achievers?

Update: The original post material is contained within the two lines of ### demarcations beginning below–scroll beyond (to New Table 1 and New Table 2) if you want to skip that. The corrections made change the teams on the original lists below, but not the final conclusions. The errors arose from mistakenly labeling two sets of columns in my working data file identically, and then using the wrong set.
#############################################################################
Original post:

First, here are the top 25 over-achievers by, team and season, for the two methods. The W/L/T columns are teams’ actual records, as assessed at the end of regulation time. GS and GA are the mean goals scored and allowed over the season and the “.diff” columns are the raw differences between actual, and expected, wins (eg, the 1930 Boston Bruins had 4.59 more wins than expected, 3.73 fewer losses, and 0.86 fewer ties). Last column is the per-game standardized value for the difference metric used.

First method (Diff.1 = Win difference – (Loss difference + Tie difference)), i.e. teams that translated any combination of fewer losses and ties than expected, into more wins than expected:

   Season League Team  W  L  T   GS   GA W.diff L.diff T.diff Diff.1
1    1930    NHL  BOS 38  6  6 3.80 2.16   4.59  -3.73  -0.86  0.520
2    1944    NHL  MTL 44  6  8 4.59 2.10   1.13  -2.24   1.11  0.517
3    1977    NHL  MTL 70  9 14 4.71 2.06  -2.69  -1.85   4.54  0.505
4    1976    NHL  MTL 69 12 11 4.10 2.14   2.37  -1.87  -0.50  0.500
5    1978    NHL  MTL 70 12 13 4.38 2.22   0.01  -2.16   2.15  0.474
6    1945    NHL  MTL 40 11  5 4.45 2.41   1.26   0.27  -1.53  0.429
7    1972    NHL  BOS 66 15 12 4.24 2.55   4.80  -5.98   1.18  0.419
8    1971    NHL  BOS 60 18  7 5.00 2.76  -2.78   4.47  -1.69  0.412
9    1939    NHL  BOS 41 11  8 2.95 1.52   2.53  -0.25  -2.28  0.367
10   1975    WHA  HSA 61 23  6 4.71 2.97   0.79   3.53  -4.32  0.356
11   1984    NHL  EDM 67 20 12 5.40 3.72   4.07  -5.75   1.68  0.354
12   1996    NHL  DET 68 19 14 3.75 2.22  -0.56   0.55   0.01  0.347
13   1952    NHL  DET 52 14 12 3.06 1.77   0.70  -0.21  -0.49  0.333
14   1973    NHL  MTL 63 13 19 4.22 2.46  -4.02  -3.35   7.37  0.326
15   1956    NHL  MTL 52 17 10 3.30 1.87   2.83  -1.83  -1.00  0.316
16   1982    NHL  NYI 65 19 15 4.71 3.04  -1.37  -2.29   3.66  0.313
17   1974    NHL  BOS 61 21 12 4.32 2.80   0.77  -1.86   1.09  0.298
18   1978    WHA  WNJ 57 23  9 4.87 3.19   0.01   1.95  -1.96  0.281
19   1995    NHL  DET 42 17  7 3.61 2.44   1.13   2.09  -3.22  0.273
20   1985    NHL  PHI 63 25 11 4.11 2.96   5.20  -3.02  -2.18  0.273
21   1985    NHL  EDM 62 22 14 5.07 3.61   0.43  -2.61   2.18  0.265
22   1981    NHL  NYI 62 21 15 4.60 3.14  -1.18  -1.74   2.92  0.265
23   1974    WHA  HSA 58 27  7 4.12 2.77   1.83   4.08  -5.91  0.261
24   1975    NHL  PHI 61 21 15 3.55 2.20  -1.14  -2.83   3.97  0.258
25   1974    NHL  PHI 59 20 15 3.39 2.10   1.51  -2.20   0.69  0.255

Second method (Diff.2 = (Win difference + Tie difference) – Loss difference), teams that translated fewer losses than expected, into any combination of more wins and ties than expected:

   Season League Team  W  L  T   GS   GA W.diff L.diff T.diff Diff.2
1    1977    NHL  MTL 70  9 14 4.71 2.06  -2.69  -1.85   4.54  0.806
2    1944    NHL  MTL 44  6  8 4.59 2.10   1.13  -2.24   1.11  0.793
3    1930    NHL  BOS 38  6  6 3.80 2.16   4.59  -3.73  -0.86  0.760
4    1978    NHL  MTL 70 12 13 4.38 2.22   0.01  -2.16   2.15  0.747
5    1976    NHL  MTL 69 12 11 4.10 2.14   2.37  -1.87  -0.50  0.739
6    1973    NHL  MTL 63 13 19 4.22 2.46  -4.02  -3.35   7.37  0.726
7    1927    NHL  OTS 29  7 14 1.88 1.40   2.25  -4.89   2.64  0.720
8    1941    NHL  BOS 33  9 16 3.26 2.10  -0.64  -6.22   6.86  0.690
9    1972    NHL  BOS 66 15 12 4.24 2.55   4.80  -5.98   1.18  0.677
10   1980    NHL  PHI 59 16 24 4.07 3.08  -0.53  -9.79  10.32  0.677
11   2013    NHL  CHI 36 12 23 2.86 2.01  -3.73  -6.42  10.15  0.662
12   1989    NHL  CGY 62 18 22 4.20 2.71  -2.73  -4.09   6.82  0.647
13   1952    NHL  DET 52 14 12 3.06 1.77   0.70  -0.21  -0.49  0.641
14   1939    NHL  BOS 41 11  8 2.95 1.52   2.53  -0.25  -2.28  0.633
15   1975    NHL  MTL 52 17 22 4.64 2.78  -9.36  -0.88  10.24  0.626
16   1996    NHL  DET 68 19 14 3.75 2.22  -0.56   0.55   0.01  0.624
17   2002    NHL  DET 55 20 30 2.96 2.15  -5.70  -6.60  12.30  0.619
18   2001    NHL  COL 59 20 26 3.14 2.15  -4.11  -5.42   9.53  0.619
19   1929    NHL  MTL 22  9 16 1.55 1.00  -3.53  -1.15   4.68  0.617
20   1982    NHL  NYI 65 19 15 4.71 3.04  -1.37  -2.29   3.66  0.616
21   1945    NHL  MTL 40 11  5 4.45 2.41   1.26   0.27  -1.53  0.607
22   1951    NHL  DET 46 15 15 3.26 1.97  -1.98  -2.99   4.97  0.605
23   1999    NHL  DAL 60 21 24 2.79 1.97   0.85  -4.96   4.11  0.600
24   1984    NHL  EDM 67 20 12 5.40 3.72   4.07  -5.75   1.68  0.596
25   1936    NHL  DET 28 11 15 2.57 1.98   0.44  -6.27   5.83  0.593

…and then the top 25 under-achievers:

First method (= Win difference – (Loss difference + Tie difference))

   Season League Team  W  L  T    GS   GA W.diff L.diff T.diff Diff.1
1    1931    NHL  PHQ  3 35  6 1.705 4.14  -1.04  -1.12   2.16 -0.864
2    1975    NHL  WSH  7 67  5 2.190 5.59   0.31  -0.27  -0.04 -0.823
3    1993    NHL  SJS  8 66 10 2.560 4.87  -5.01   3.50   1.51 -0.810
4    1981    NHL  WIN  9 57 14 3.075 5.00  -4.94   1.20   3.74 -0.775
5    1930    NHL  PTP  5 33  5 2.326 4.16  -2.51   1.50   1.01 -0.767
6    1994    NHL  OTT 10 57 17 2.345 4.68  -2.95  -6.28   9.23 -0.762
7    1993    NHL  OTT 10 64 10 2.405 4.63  -0.43   0.88  -0.45 -0.762
8    1944    NHL  NYR  6 38  5 3.286 6.20  -1.33   0.39   0.94 -0.755
9    2014    NHL  BUF 11 51 20 1.793 2.90  -4.41  -0.96   5.37 -0.732
10   1976    NHL  WSH 11 58 10 2.797 4.94  -2.85   0.90   1.95 -0.722
11   1941    NHL  NYA  7 27 14 2.021 3.81  -2.06  -4.47   6.53 -0.708
12   1995    NHL  OTT  7 33  7 2.383 3.62  -4.00   3.97   0.03 -0.702
13   1990    NHL  QUE 12 60  8 3.000 5.08  -2.56   3.36  -0.80 -0.700
14   1976    NHL  KCS 12 56 12 2.375 4.39  -0.29  -2.49   2.78 -0.700
15   1973    NHL  NYI 12 60  6 2.179 4.45   0.30   1.61  -1.91 -0.692
16   1929    NHL  CBH  7 26 11 0.750 1.86   0.50  -1.12   0.62 -0.682
17   1928    NHL  CBH  7 34  3 1.545 3.05  -1.35   4.53  -3.18 -0.682
18   1929    NHL  PTP  7 23 13 0.953 1.70  -2.11  -0.37   2.48 -0.674
19   1974    NHL  CGS 13 55 10 2.500 4.38  -1.63   0.88   0.75 -0.667
20   1940    NHL  MTL  8 33  7 1.812 3.48  -2.29   1.54   0.75 -0.667
21   2017    NHL  COL 14 56 12 1.927 3.34  -1.80   1.75   0.05 -0.659
22   2015    NHL  BUF 14 51 17 1.854 3.24  -1.29  -3.10   4.39 -0.659
23   2015    NHL  ARI 14 50 18 1.951 3.22  -2.65  -2.31   4.96 -0.659
24   2006    NHL  STL 14 46 22 2.317 3.38  -6.13  -3.21   9.34 -0.659
25   2000    NHL  ATL 14 57 11 2.073 3.77   1.22  -2.32   1.10 -0.659

Second method (= (Win difference + Tie difference) – Loss difference)

   Season League Team  W  L  T   GS   GA W.diff L.diff T.diff Diff.2
1    1975    NHL  WSH  7 67  5 2.19 5.59   0.31  -0.27  -0.04 -0.696
2    1931    NHL  PHQ  3 35  6 1.70 4.14  -1.04  -1.12   2.16 -0.591
3    1993    NHL  SJS  8 66 10 2.56 4.87  -5.01   3.50   1.51 -0.571
4    1944    NHL  NYR  6 38  5 3.29 6.20  -1.33   0.39   0.94 -0.551
5    1928    NHL  CBH  7 34  3 1.55 3.05  -1.35   4.53  -3.18 -0.545
6    1973    NHL  NYI 12 60  6 2.18 4.45   0.30   1.61  -1.91 -0.538
7    1930    NHL  PTP  5 33  5 2.33 4.16  -2.51   1.50   1.01 -0.535
8    1993    NHL  OTT 10 64 10 2.40 4.63  -0.43   0.88  -0.45 -0.524
9    1990    NHL  QUE 12 60  8 3.00 5.08  -2.56   3.36  -0.80 -0.500
10   1976    NHL  WSH 11 58 10 2.80 4.94  -2.85   0.90   1.95 -0.468
11   1954    NHL  CBH 12 51  7 1.90 3.46  -2.62   4.15  -1.53 -0.457
12   1979    WHA  INR  5 18  2 3.12 5.20   0.94   0.01  -0.95 -0.440
13   1981    NHL  WIN  9 57 14 3.08 5.00  -4.94   1.20   3.74 -0.425
14   1974    NHL  CGS 13 55 10 2.50 4.38  -1.63   0.88   0.75 -0.410
15   1995    NHL  OTT  7 33  7 2.38 3.62  -4.00   3.97   0.03 -0.404
16   1976    NHL  KCS 12 56 12 2.38 4.39  -0.29  -2.49   2.78 -0.400
17   2000    NHL  ATL 14 57 11 2.07 3.77   1.22  -2.32   1.10 -0.390
18   1992    NHL  SJS 16 55  9 2.73 4.45  -2.25   3.96  -1.71 -0.375
19   1977    NHL  DET 16 55  9 2.29 3.86  -0.85   2.11  -1.26 -0.375
20   1940    NHL  MTL  8 33  7 1.81 3.48  -2.29   1.54   0.75 -0.375
21   1970    NHL  LAK 14 52 10 2.21 3.82   0.40  -1.84   1.44 -0.368
22   2017    NHL  COL 14 56 12 1.93 3.34  -1.80   1.75   0.05 -0.366
23   1996    NHL  OTT 18 56  8 2.33 3.51  -0.33   4.08  -3.75 -0.366
24   1971    NHL  CGS 20 53  5 2.55 4.10   0.83   3.37  -4.20 -0.359
25   1994    NHL  OTT 10 57 17 2.35 4.68  -2.95  -6.28   9.23 -0.357

OK then, some things are obvious here, and a few things also may need to be reiterated.

First, as a reminder, this is NOT an evaluation of the best and worst teams across NHL and WHA history; that’s not what I’m doing. This is an evaluation of teams that most deviated what their expected records should have been, positively or negatively, given how they distributed their goals scored and allowed over the games within their season. Different thing entirely–and there is no apriori reason to expect, necessarily, that the two different types of analysis should give similar results. For all we know, entirely great, or mediocre, or awful, teams, are equally likely to either over- or under- achieve, relative to their W/L/T expectation.

This of course is not what the results show. Without getting into any “greatest/worst” arguments, it’s clear that some of the very best NHL teams show up in the first two tables, and some of the very worst show up in the second two. There are exactly zero bad, or even mediocre, teams in the first two tables and zero good or mediocre teams in the second two. That is decidedly not random, and the conclusion must be that really good teams are not so just because they score lots more goals than they give up (obviously). No, they also get more out of the difference between the two, than worse teams do. And similarly, really awful teams are good at producing an even worse record than their goal deficits alone would predict.

More specifically, the first nine entries in the first table, and eight of the first nine in the second, are all either Boston or Montreal, certainly two of the most successful NHL franchises by any standard. Several of the best single seasons, and even multi-year dynasties are represented, such as the 1976-78 Montreal Canadien juggernaut, and the early 1970s Bruins teams. Note also, that the values for “Diff.2” in the second table are higher than for “Diff.1” in the first table. This indicates that teams that are exceeding expectations are doing so more by converting potential losses into ties or wins, than by converting losses or ties into wins. In other words, they’re primarily losing games less often than expected. There are however, also some differences between the two, showing that the metric used is important. For example, the 2013 Stanley Cup champion Chicago Black Hawks show up only in the second table: they were the only team of the last decade to excel at not losing games that they really “should” have.

Conversely the third and fourth tables display some of the truly awful teams in history, the most recent being the 1993-95 Ottawa Senators and the 1993 San Jose Sharks. In contrast to over-achievers, under-achievers were mainly failing in the win column, i.e. more likely to be failing to win games they should have, rather than losing games they should not have. Occasionally a team, like the 1995 Senators, seems to manage to convert wins directly into losses (four fewer wins and four more losses, with ties about unchanged).

These various results are collectively interesting. They demonstrate that, even though departures from seasonal mean values for goal scoring, and allowing, are very strongly random across most professional hockey seasons in North America, there is nevertheless evidence that really good teams do exceed already high expectations, and similarly that really bad ones do not even achieve their already very low ones. A more thorough analysis on this could be done–and may eventually be–but this’ll have to do for now.
################################################################################

Altered (correct!) data tables and accompanying discussion:

New Table 1: The 25 teams most over-performing their W/L/T expectation. [The W/L/T columns are teams’ actual records, as assessed at the end of regulation time. GS and GA are the mean goals scored and allowed over the season and the “#.diff” columns are the raw differences between actual, and expected W/L/T records (eg, the 1930 Boston Bruins had 4.59 more wins than expected, 3.73 fewer losses, and 0.86 fewer ties). Right-most column is the (per-game standardized) value for the difference metric–it equals the mean of the two variables used above (“Diff.1” and “Diff.2”)]

   Season League Team  W  L  T   GS   GA W.diff L.diff T.diff Diff.3
1    1930    NHL  BOS 38  6  6 3.80 2.16   4.59  -3.73  -0.86 0.1664
2    1995    NHL  EDM 16 25  7 2.81 3.77   3.68  -3.51  -0.17 0.1498
3    1927    NHL  OTS 29  7 14 1.88 1.40   2.25  -4.89   2.64 0.1428
4    2014    NHL  NSH 33 32 17 2.57 2.80   3.98  -7.46   3.48 0.1395
5    1928    NHL  PTP 19 17 10 1.52 1.76   3.24  -2.72  -0.52 0.1296
6    1976    NHL  LAK 41 37 11 3.12 3.28   6.46  -4.73  -1.73 0.1257
7    1936    NHL  DET 28 11 15 2.57 1.98   0.44  -6.27   5.83 0.1243
8    1995    NHL  TOR 23 23  9 2.80 3.05   4.11  -2.49  -1.62 0.1200
9    1967    NHL  TOR 39 31 12 2.90 2.94   5.30  -4.54  -0.76 0.1200
10   1972    NHL  BOS 66 15 12 4.24 2.55   4.80  -5.98   1.18 0.1159
11   1987    NHL  HAR 42 33 11 3.52 3.44   5.61  -4.18  -1.43 0.1138
12   1982    NHL  NYR 44 31 15 3.94 3.86   4.90  -4.70  -0.20 0.1067
13   1959    NHL  BOS 35 31 11 2.94 3.03   4.96  -3.08  -1.88 0.1044
14   1994    NHL  PIT 42 29 19 3.41 3.37   2.88  -6.27   3.39 0.1017
15   2012    NHL  FLA 34 28 27 2.39 2.53   0.43  -8.60   8.17 0.1015
16   1929    NHL  NYA 18 13 15 1.13 1.15   1.03  -3.61   2.58 0.1009
17   1975    WHA  NEW 37 34 13 3.37 3.65   3.13  -5.33   2.20 0.1007
18   2014    NHL  ANA 50 25 20 3.05 2.49   4.42  -5.07   0.65 0.0999
19   1984    NHL  EDM 67 20 12 5.40 3.72   4.07  -5.75   1.68 0.0992
20   1928    NHL  MTM 25 15 12 1.87 1.52   1.97  -3.18   1.21 0.0990
21   1986    NHL  WSH 51 25 13 3.90 3.30   4.91  -3.90  -1.01 0.0990
22   2012    NHL  CGY 32 29 21 2.40 2.56   1.04  -7.06   6.02 0.0988
23   1941    NHL  BOS 33  9 16 3.26 2.10  -0.64  -6.22   6.86 0.0962
24   1988    NHL  BUF 38 35 13 3.53 3.86   4.12  -4.14   0.02 0.0960
25   1947    NHL  TOR 37 22 12 3.35 2.80   2.30  -4.47   2.17 0.0954

New Table 2: The 25 teams most under-performing their W/L/T expectation:

   Season League Team  W  L  T   GS   GA W.diff L.diff T.diff  Diff.3
1    2013    NHL  TBL 16 26  6 3.04 3.04  -2.84   5.79  -2.95 -0.1798
2    1995    NHL  OTT  7 33  7 2.38 3.62  -4.00   3.97   0.03 -0.1696
3    1932    NHL  BOS 14 20 14 2.52 2.35  -6.36   0.75   5.61 -0.1481
4    1987    NHL  PIT 25 34 21 3.65 3.58 -10.00   1.62   8.38 -0.1452
5    1934    NHL  OTS 12 27  9 2.35 2.90  -3.40   3.02   0.38 -0.1338
6    1928    NHL  CBH  7 34  3 1.55 3.05  -1.35   4.53  -3.18 -0.1336
7    1936    NHL  MTL 11 24 13 1.71 2.52  -4.33   2.01   2.32 -0.1321
8    1982    NHL  MTL 48 18 18 4.46 2.74  -8.86   1.71   7.15 -0.1258
9    1931    NHL  NYR 20 16 12 2.35 1.85  -4.83   1.18   3.65 -0.1252
10   1975    NHL  BOS 41 27 15 4.34 3.08  -7.28   2.84   4.44 -0.1219
11   2007    NHL  NYR 35 33 24 2.82 2.48  -9.38   1.66   7.72 -0.1200
12   1934    NHL  BOS 14 25  8 2.11 2.64  -2.22   3.13  -0.91 -0.1138
13   1985    NHL  WSH 42 28 14 3.83 2.99  -5.29   4.15   1.14 -0.1124
14   1992    NHL  MTL 38 30 23 3.18 2.57  -9.27   0.95   8.32 -0.1123
15   2011    NHL  NYI 19 39 24 2.66 3.06  -9.41  -0.35   9.76 -0.1105
16   1984    NHL  PIT 15 53 12 3.16 4.81  -5.57   2.90   2.67 -0.1059
17   1931    NHL  OTS  9 29  6 2.02 3.20  -2.41   2.21   0.20 -0.1050
18   1948    NHL  CBH 19 34  6 3.24 3.76  -1.99   4.20  -2.21 -0.1049
19   1973    WHA  CHC 23 44 11 3.10 3.71  -3.90   4.22  -0.32 -0.1041
20   1970    NHL  PHI 17 35 24 2.59 2.96  -9.51  -1.60  11.11 -0.1041
21   2006    NHL  NYR 33 30 23 2.91 2.56  -8.08   0.80   7.28 -0.1033
22   1993    NHL  SJS  8 66 10 2.56 4.87  -5.01   3.50   1.51 -0.1013
23   1928    NHL  MTL 25 10 11 2.54 1.07  -4.00   0.66   3.34 -0.1013
24   2013    NHL  OTT 21 22 14 2.37 2.23  -4.44   1.26   3.18 -0.1000
25   1994    NHL  TBL 26 39 18 2.60 2.95  -5.95   2.31   3.64 -0.0995

OK then, some things are obvious here, and a few things also may need to be reiterated.

First, as a reminder, this is not an evaluation of the best and worst teams across NHL/WHA history; that’s not what I’m doing. This is an evaluation of teams that most deviated what their expected records should have been, positively or negatively, given how they distributed their goals scored and allowed over the games within their season. Different thing entirely–and there is no apriori reason to expect, necessarily, that the two different types of analysis should give similar results. For all we know, entirely great, or mediocre, or awful, teams, are equally likely to either over- or under- achieve, relative to their W/L/T expectation.

One conclusion that does not change qualitatively (but does quantitatively), relative to the mistaken analysis above, is that over-achieving teams tend (strongly) to be teams that are already good, whereas under-achieving tend, equally strongly, to be teams that are already bad. Specifically, the combined (summed) W-L-T record for the 25 teams in New Table 1, is 917-583-343. Standardized to the current (82 game) schedule, that’s 41-26-15. That’ll easily get you into the playoffs, as currently structured, every year. Conversely, for New Table 2, the record sum is 545-776-346, which is 27-38-17 over 82 games…and that will not get you in. Using the mistaken data, those good and bad records were even better, and worse, respectively.

So, the principal conclusion of the analysis still holds, and it’s the take home message of the analysis: even though departures from seasonal mean values for goal scoring and allowing very strongly follow a random process across NHL/WHA history, there is nevertheless strong evidence that teams that most exceed their record expectations, tend to be good teams already, whereas teams that most under-achieve tend to be bad already.

What does differ, between the faulty and corrected data and analyses, is the list of teams making the two lists. Initially, not a single bad or mediocre team was on the lists of over-achievers, and similarly, not a single good or mediocre team was on the lists of under-achievers. With the corrected data, these exceptions do occur, and it’s interesting to look at them. For example, four years ago the Nashville Predators were 33-32-17 (remember, assessed at the end of regulation time): about as mediocre as you can get. But according to their goals scored and allowed distributions, they should have finished 29-39-14. They were outscored by about 0.25 goals per game but still managed to finish above .500. They still didn’t make the playoffs but didn’t miss by much, and their over-performance, if they’d been just a little better to start with, would have gotten them in.

Conversely, the 1982 Montreal Canadiens were 48-18-18, terrific. They outscored their opponents by an average of 1.72 goals per game, enormous. They should thereby have been one of the greater teams in NHL history, finishing at roughly 57-16-11 or so. But no; they failed to win nine games that they really should have, and also lost a couple they shouldn’t have. Similar story for the 1984/85 Washington Capitals, and some others as well. And then there were the 1993 San Jose Sharks…the Sharks were terrible to begin with, but somehow managed to make it even worse by not winning five games they by rights should have, finishing at 8-66-10, one of the very worst NHL teams ever.

The point of this discussion is important. If goals scored and allowed are truly random (= Poissonian) processes around the mean values, then the distribution of differences therein (i.e. game scores) will follow a Skellam distribution, and departures from expected W-L-T records will also be random. We should therefore see, in each of the two tables above, a mixture of good, bad, and mediocre teams collectively assembling to about a .500 record. In a truly random process, deviation from expectation should not be related to inherent team skill–there is no apriori reason to expect that to be the case. The results indicate instead, that at least for the top 25 over- and under- achievers, that there is a synergism, in which teams that already “know how to win”, in fact win even more than is already expected, and the same is true in the opposite direction for teams that already “know how to lose”. And that’s a finding.

In conclusion, it should go without saying that you should keep your stick on the ice and stay out of the penalty box, unless absolutely warranted. I personally wouldn’t take that octopus home and eat it either, and I’m fairly hungry.

This is the official termination of this post and you will now have to find something else to do.

Advertisements

Have at it

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s