Stanley Cup Finals time and a look at some stats

Tonight, the Stanley Cup Finals of the National Hockey League begin, between the NY Rangers and LA Kings. The first three rounds have been especially entertaining, especially in the Western Conference, where the Kings have pulled off some truly amazing feats in running through a gauntlet of three of the league’s top teams, San Jose, Anaheim and Chicago. The Kings are fairly heavy favorites to win their second Cup in three years.
HawksKingsThe LA Kings pulled another rabbit out of their helmets against the Chicago Blackhawks, to advance to the Stanley Cup Finals against the New York Rangers.

One topic I’m always interested in is how sports leagues and their playoffs are structured, one that I don’t think gets nearly enough attention compared to other issues, like personnel, trades, salaries, team strategy, etc. The way that teams are grouped into divisions/conferences can have a very definite and strong effect on who does and does not make the playoff round, although the NHL is better than say, baseball in that respect, and definitely far better in terms of the structure of the playoff rounds themselves.

I especially like to look at the overall strengths of divisions; in most sports, teams play more games against other teams in their own division (and/or conference, league or higher level grouping) than teams not therein. Teams that happens to play in relatively weak groupings have a definite advantage over those playing in stronger ones. With that in mind I wanted to look at records, and goals scored and allowed, in NHL inter-division games vs intra-division games, and similarly, within and between conferences (click on the link above to see the NHL league structure and final season standings).

But to my surprise, none of the go-to hockey stats sites (Hockey Reference, ESPN, etc) had that information broken out, which is fairly standard fare in baseball. So I had to do it myself. This entailed writing some code in R (when I should have been working–two minutes in the box for delay of game). I used the comma separated scores from all 1234 NHL regular season games as the data source, saved as a file named “gameresults2014.txt”. A second file “teams2014.csv” lists the name of each of the 30 teams, its conference, and its division. The native structure of the games file is by home vs road team, which is not useful for my purpose, so I had to reformat it before analyzing.

Looking strictly at the number of goals scored and allowed (not records), there were very clear differences in strength between divisions. In particular, the Central Division was easily the strongest, outscoring opponents from other divisions by 0.28 goals/game and the Atlantic Division the weakest, being outscored by about 0.25 goals/game. The other two divisions (Metro and Pacific) had goal differentials very near zero. Accordingly, the Western Conference was quite a bit stronger than the Eastern, at about 0.23 goals/game. However, the six pairwise comparisons between divisions showed that the largest difference was actually between the Central and Pacific divisions of the Western Conference, with the former outscoring the latter by about 0.42 goals/game. This is likely due in part to the Pacific containing two of the league’s weakest teams in Edmonton and Calgary, although I didn’t check it. The Kings and Rangers only played each other twice in the season, so there wasn’t much sense in analyzing that data either.

R code:

setwd("C:/Old Computer Files/From HP2/Documents/Hockey")

# Read in the game scores for all games
games = read.csv("gameresults2014.txt",header=T,[,1:5]

# Get team names, divisions, conferences, from "teams2014.csv"
teams = read.csv("teams2014.csv",header=T,[,1:3]
teams$ps = 0; teams$ps[c(1:4,9:12,17:21,24:26)] = 1
playoffs = teams[which(teams$ps==1),1:3]

# Merge information from the two files, and rearrange:
# Note: can't get the "merge" function to work, AS USUAL!!! (most obtuse R function EVER), therefore use a "for" loop:
games$Vis.conf=NA; games$Vis.div=NA; games$Home.conf=NA; games$Home.div=NA
for (i in 1:nrow(games)){
 team.vis = games[i,2]; team.home = games[i,4]
 games$Home.conf[i] = teams[which(teams$Team==team.home),2]
 games$Home.div[i] = teams[which(teams$Team==team.home),3]
 games$Vis.conf[i] = teams[which(teams$Team==team.vis),2]
 games$Vis.div[i] = teams[which(teams$Team==team.vis),3]
games$sum = games[,5] + games[,9]

# designate the playoff teams
games$PS = 0
for (i in 1:nrow(games)){
 if (any(playoffs[,1] == games[i,2]) & any(playoffs[,1] == games[i,6])) games$PS[i] = 1

## Determine the total number of games played within and between divisions and conferences, for the 6 each such:
# EC = Eastern Conf., AD = Atlantic Div., etc
games$EC = apply(games[,c(3,7)], 1 , function(x) sum(x=="E"))
games$WC = apply(games[,c(3,7)], 1 , function(x) sum(x=="W"))
games$AD = apply(games[,c(4,8)], 1 , function(x) sum(x=="A"))
games$MD = apply(games[,c(4,8)], 1 , function(x) sum(x=="M"))
games$CD = apply(games[,c(4,8)], 1 , function(x) sum(x=="C"))
games$PD = apply(games[,c(4,8)], 1 , function(x) sum(x=="P"))

# Conference level: 
inter.conf.w1 = which(inter.conf[,3]=="W")
inter.conf.w2 = which(inter.conf[,7]=="W")
west.goals = c(inter.conf[inter.conf.w1,5], inter.conf[inter.conf.w2, 9]) 
east.goals = c(inter.conf[inter.conf.w1,9], inter.conf[inter.conf.w2, 5]) 

# Goals per team per game:
mean(c(games$Vis.goals, games$Home.goals),na.rm=T)		# All games, all teams

## Inter-division goals, each division against all others combined:
atla = games[which(games$AD==1),]
atla1 = which(atla[,4]=="A"); atla2 = which(atla[,8]=="A")
mean(c(atla[atla1,5], atla[atla2,9]),na.rm=T)			# the target division
mean(c(atla[atla1,9], atla[atla2,5]),na.rm=T)			# the opponent's division

metr = games[which(games$MD==1),]
metr1 = which(metr[,4]=="M"); metr2 = which(metr[,8]=="M")
mean(c(metr[metr1,5], metr[metr2,9]),na.rm=T)
mean(c(metr[metr1,9], metr[metr2,5]),na.rm=T)

cent = games[which(games$CD==1),]
cent1 = which(cent[,4]=="C"); cent2 = which(cent[,8]=="C")
mean(c(cent[cent1,5], cent[cent2,9]),na.rm=T)
mean(c(cent[cent1,9], cent[cent2,5]),na.rm=T)

paci = games[which(games$PD==1),]
paci1 = which(paci[,4]=="P"); paci2 = which(paci[,8]=="P")
mean(c(paci[paci1,5], paci[paci2,9]),na.rm=T)
mean(c(paci[paci1,9], paci[paci2,5]),na.rm=T)

## Inter-division goals, pairwise:
# = Atlantic vs Metro Division games, etc. = games[which(games$AD==1 & games$MD==1),]
at.me1 = which([,4]=="A"); at.me2 = which([,8]=="A")

at.ce = games[which(games$AD==1 & games$CD==1),]
at.ce1 = which(at.ce[,4]=="A"); at.ce2 = which(at.ce[,8]=="A")
mean(c(at.ce[at.ce1,5], at.ce[at.ce2,9]),na.rm=T)
mean(c(at.ce[at.ce1,9], at.ce[at.ce2,5]),na.rm=T) = games[which(games$AD==1 & games$PD==1),]
at.pa1 = which([,4]=="A"); at.pa2 = which([,8]=="A")

me.ce = games[which(games$MD==1 & games$CD==1),]
me.ce1 = which(me.ce[,4]=="M"); me.ce2 = which(me.ce[,8]=="M")
mean(c(me.ce[me.ce1,5], me.ce[me.ce2,9]),na.rm=T)
mean(c(me.ce[me.ce1,9], me.ce[me.ce2,5]),na.rm=T) = games[which(games$MD==1 & games$PD==1),]
me.pa1 = which([,4]=="M"); me.pa2 = which([,8]=="M")
mean(c([me.pa1,9],[me.pa2,5]),na.rm=T) = games[which(games$CD==1 & games$PD==1),]
ce.pa1 = which([,4]=="C"); ce.pa2 = which([,8]=="C")

Have at it

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s