Well they’ve been running around on the flat expanses of the early Holocene lake bed with impressively large machines, whacking down and gathering the soybeans and corn. This puts dirt clods on the roads that cause one on a road bike at dusk to weave and swear, but I digress. The Farmer’s Almanac indicates says that it must therefore be about World Series time, which in turn is just about approximately guaranteed to initiate various comments regarding the role of luck, good or bad, in deciding important baseball game outcomes.
There are several important things to be blurted out on this important topic and with the Series at it’s climax and the leaves a fallin’ now’s the time, the time is now.
It was Bill James, the baseball “sabermetric” grandpa and chief guru, who came up with the basic idea some time ago, though not with the questionable terminology applied to it I think, which I believe came later from certain disciples who knelt at his feet.
The basic idea starts off well enough but from there goes into a kind of low-key downhill slide, not unlike the truck that you didn’t bother setting the park brake for because you thought the street grade was flat but found out otherwise a few feet down the sidewalk. At which point you also discover that the bumper height of said truck does not necessarily match that of a Mercedes.
The concept applies not just to baseball but anything involving integer scores. Basic idea is as follows (see here). Your team plays 162 baseball games, 25 soccer matches or whatever, and of course you keep score of each. You then compute the fraction S^x/(S^x + A^x), where using the baseball case, S = runs scored, A = runs allowed and x = an exponent that varies depending on the data used (i.e. the teams and years used). You do this for each team in the league and also compute each team’s winning percentage (WP = W/G, where W = number of wins and G = games played in the season(s)). A nonlinear regression/optimization returns the optimal value of x, given the data. The resulting fraction is known as the “pythagorean expectation” of winning percentage, claiming to inform us of how many games a given team “should” have won and lost over that time, given their total runs scored and allowed.
Note first that the value of x depends on the data used: the relationship is entirely empirically derived, and exponents ranging from (at least) 1.8 to 2.0 have resulted. There is no statistical theory here whatsoever, and in no description of “the pythag” have I ever seen any mention of such. This is a shame because (1) there can and should be, and (2) it seems likely that most “sabermatricians” don’t have any idea as to how or why. Maybe not all, but I haven’t seen any discuss the matter. Specifically, this is a classic case for application of Poisson-derived expectations.
However the lack of theory is one, but not really the main, point here. More at issue are the highly questionable interpretations of the causes of observed deviations from pythag expectations, where the rolling truck smashes out the grill and lights of the Mercedes.
You should base an analysis like this on the Poisson distribution for at least two very strong reasons. First, interpretations of the pythag always involve random chance. That is, the underlying view is that departures of a given team’s won-loss record from pythag expectation is always attributed to the action of randommness–random chance. Great, if you want to go down that road, that’s exactly what the Poisson distribution is designed to address. Secondly, it will give you additional information regarding the role of chance that you cannot get from “the pythag”.
Indeed, the Poisson gives the expected distribution of integer-valued data around a known mean, under the assumption that random deviations from that mean are solely the result of sampling error, which in turn results from the combination of
Complete Spatial Randomness (CSR) complete randomness of the objects, relative to the mean value and the size of the sampling frame. In our context, the sampling frame is a single game and the objects of analysis are the runs scored, and allowed, in each game. The point is that the Poisson is inherently designed to test just exactly what the SABR-toothers are wanting to test. But they don’t use it–they instead opt for the fully ad-hoc pythag estimator (or slight variations thereof). Always.
So, you’ve got a team’s total runs scored and allowed over its season. You divide that by the number of games played to give you the mean of each. That’s all you need–the Poisson is a single parameter distribution, the variance being a function of the mean. Now you use that computer in front of you for what it’s really ideal at–doing a whole bunch of calculations really fast–to simply draw from the runs scored, and runs allowed, distributions, randomly, say 100,000 times or whatever, to estimate your team’s real expected won-loss record under a fully random score distribution process. But you can also do more–you can test whether either the runs scored or allowed distribution fits the Poisson very well, using a chi-square goodness-of-fit test. And that’s important because it tells you basically, whether or not they are homogeneous random processes–processes in which the data generating process is unchanging through the season. In sports terms: it tells you the degree to which the team’s performance over the year, offensive and defensive, came from the same basic conditions (i.e. unchanging team performance quality/ability).
The biggest issue remains however–interpretation. I don’t how it all got started, but somewhere, somebody decided that a positive departure from “the pythag” (more wins than expected) equated to “good luck” and negative departures to “bad luck”. Luck being the operative word here. Actually I do know the origin–it’s a straight forward conclusion from attributing all deviations from expectation to “chance”. The problem is that many of these deviations are not in fact due to chance, and if you analyze the data using the Poisson as described above, you will have evidence of when it is, and is not, the case.
For example, a team that wins more close games than it “should”, games won by say just one or two runs, while getting badly smoked in a small subset of other games, will appear to benefit from “good luck”, according to the pythag approach. But using the Poisson approach, you can identify whether or not a team’s basic quality likely changed at various times during the season. Furthermore, you can also examine whether the joint distribution of events (runs scored, runs allowed), follows random expectation, given their individual distributions. If they do not, then you know that some non-random process is going on. For example, that team that wins (or loses) more than it’s expected share of close games most likely has some ability to win (or lose) close games–something about the way the team plays explains it, not random chance. There are many particular explanations, in terms of team skill and strategy, that can explain such results, and more specific data on a team’s players’ performance can lend evidence to the various possibilities.
So, the whole “luck” explanation that certain elements of the sabermetric crowd are quite fond of and have accepted as the Gospel of James, may be quite suspect at best, or outright wrong. I should add however that if the Indians win the series, it’s skill all the way while if the Cubs win it’ll most likely be due to luck.