Well they’ve been running around on the flat expanses of the early Holocene lake bed with impressively large machines, whacking down and gathering the soybeans and corn. This puts dirt clods on the roads that cause one on a road bike at dusk to weave and swear, but I digress. The Farmer’s Almanac indicates says that it must therefore be about World Series time, which in turn is just about approximately guaranteed to initiate various comments regarding the role of luck, good or bad, in deciding important baseball game outcomes.

There are several important things to be blurted out on this important topic and with the Series at it’s climax and the leaves a fallin’ now’s the time, the time is now.

It was Bill James, the baseball “sabermetric” grandpa and chief guru, who came up with the basic idea some time ago, though not with the questionable terminology applied to it I think, which I believe came later from certain disciples who knelt at his feet.

The basic idea starts off well enough but from there goes into a kind of low-key downhill slide, not unlike the truck that you didn’t bother setting the park brake for because you thought the street grade was flat but found out otherwise a few feet down the sidewalk. At which point you also discover that the bumper height of said truck does not necessarily match that of a Mercedes.

The concept applies not just to baseball but anything involving integer scores. Basic idea is as follows (see here). Your team plays 162 baseball games, 25 soccer matches or whatever, and of course you keep score of each. You then compute the fraction S^x/(S^x + A^x), where using the baseball case, S = runs scored, A = runs allowed and x = an exponent that varies *depending on the data used* (i.e. the teams and years used). You do this for each team in the league and also compute each team’s winning percentage (WP = W/G, where W = number of wins and G = games played in the season(s)). A nonlinear regression/optimization returns the optimal value of x, given the data. The resulting fraction is known as the “pythagorean expectation” of winning percentage, claiming to inform us of how many games a given team “should” have won and lost over that time, given their total runs scored and allowed.

Note first that the value of x depends on the data used: the relationship is entirely empirically derived, and exponents ranging from (at least) 1.8 to 2.0 have resulted. There is no statistical theory here whatsoever, and in no description of “the pythag” have I ever seen any mention of such. This is a shame because (1) there can and should be, and (2) it seems likely that most “sabermatricians” don’t have any idea as to how or why. Maybe not all, but I haven’t seen any discuss the matter. Specifically, this is a classic case for application of Poisson-derived expectations.

However the lack of theory is one, but not really the main, point here. More at issue are the highly questionable interpretations of the *causes of observed deviations from pythag expectations*, where the rolling truck smashes out the grill and lights of the Mercedes.

You should base an analysis like this on the Poisson distribution for at least two very strong reasons. First, interpretations of the pythag always involve random chance. That is, the underlying view is that departures of a given team’s won-loss record from pythag expectation is always attributed to the action of randommness–random chance. Great, if you want to go down that road, that’s exactly what the Poisson distribution is designed to address. Secondly, it will give you additional information regarding the role of chance that you cannot get from “the pythag”.

Indeed, the Poisson gives the expected distribution of integer-valued data around a known mean, under the assumption that random deviations from that mean are solely the result of sampling error, which in turn results from the combination of ~~Complete Spatial Randomness (CSR)~~ complete randomness of the objects, relative to the mean value and the size of the sampling frame. In our context, the sampling frame is a single game and the objects of analysis are the runs scored, and allowed, in each game. The point is that the Poisson is inherently designed to test just exactly what the SABR-toothers are wanting to test. But they don’t use it–they instead opt for the fully ad-hoc pythag estimator (or slight variations thereof). Always.

So, you’ve got a team’s total runs scored and allowed over its season. You divide that by the number of games played to give you the mean of each. That’s all you need–the Poisson is a single parameter distribution, the variance being a function of the mean. Now you use that computer in front of you for what it’s really ideal at–doing a whole bunch of calculations really fast–to simply draw from the runs scored, and runs allowed, distributions, randomly, say 100,000 times or whatever, to estimate your team’s real expected won-loss record under a fully random score distribution process. But you can also do more–you can test whether either the runs scored or allowed distribution fits the Poisson very well, using a chi-square goodness-of-fit test. And that’s important because it tells you basically, whether or not they are *homogeneous* random processes–processes in which the data generating process is unchanging through the season. In sports terms: it tells you the degree to which the team’s performance over the year, offensive and defensive, came from the same basic conditions (i.e. unchanging team performance quality/ability).

The biggest issue remains however–interpretation. I don’t how it all got started, but somewhere, somebody decided that a positive departure from “the pythag” (more wins than expected) equated to “good luck” and negative departures to “bad luck”. Luck being the operative word here. Actually I do know the origin–it’s a straight forward conclusion from attributing all deviations from expectation to “chance”. The problem is that many of these deviations are *not* in fact due to chance, and if you analyze the data using the Poisson as described above, you will have evidence of when it is, and is not, the case.

For example, a team that wins more close games than it “should”, games won by say just one or two runs, while getting badly smoked in a small subset of other games, will appear to benefit from “good luck”, according to the pythag approach. But using the Poisson approach, you can identify whether or not a team’s basic quality likely changed at various times during the season. Furthermore, you can also examine whether the *joint distribution* of events (runs scored, runs allowed), follows random expectation, given their individual distributions. If they do not, then you know that some non-random process is going on. For example, that team that wins (or loses) more than it’s expected share of close games most likely has some *ability* to win (or lose) close games–something about the way the team plays explains it, not random chance. There are many particular explanations, in terms of team skill and strategy, that can explain such results, and more specific data on a team’s players’ performance can lend evidence to the various possibilities.

So, the whole “luck” explanation that certain elements of the sabermetric crowd are quite fond of and have accepted as the Gospel of James, may be quite suspect at best, or outright wrong. I should add however that if the Indians win the series, it’s skill all the way while if the Cubs win it’ll most likely be due to luck.

Jim said:

Well they’ve been running around on the flat expanses of the early Holocene lake bed with impressively large machines, whacking down and gathering the soybeans and corn. This puts dirt clods on the roads that cause one on a road bike at dusk to weave and swear, but I digress

Hey, our future food supply has to be gathered in… think of those hulking, dirt clog spewing machines as giant squirrels, gathering nuts for winter. You get the luxury of riding your road bike at dusk because you are not otherwise engaged in gathering your winter sustenance. But now it is me digressing.

The baseball fortunes are certainly in the wind. Tied 3 games a piece, the mighty Great Lakes ball teams are set to decide it all on the shores of Lake Erie this evening. To the loser tonight it might seem like the wreck of the Edmund Fitzgerald (but on the wrong lake).

To the more salient points at hand – I do like the subject of fitness and population genetics. Having to get out there on the land and captain a much smaller version of the dirt spewing soybean gathering machines I haven’t the necessary time to do this justice… but for a single sentence or so I would offer that looking at the change in fitness over time is a bit strange to me. Fitness is a result – the measured effect of various and sundry causes. Where environmental stimuli are quite variable (drought, late or early frost, fire, ect) the populations with means to reproduce better relative to other populations within the same theater, their fitness might spike relative to some historic trend line of fitness within the ecosystem. Is this not a good thing (for this particular species)?

Definitely right Clem on the benefits of the harvest. To be clear, I like the harvesting, and especially the harvest

season. The silent swearing is directed neither at the harvesters nor their machines, but rather comes involuntarily from my lower spine at the end of a long ride when I hit a clod, directed at no particular thing or one. And I’m always checking out the fields and woodlots as I ride–I love it. But is it just me or does the harvest occur later than it used to? There’s still a lot of standing corn around here (maybe 1/2 down?) and even some beans here and there (maybe 1/10?)–I can’t see the advantage gained.I see fitness as a continual work-in-progress and a spike in fitness is a good thing for that environment in which it occurs, but a potentially bad thing when the latter changes. The conservative bet hedger argument (Orr etc.) is that fluctuating fitness, resulting from high adaptation of different genotypes to differing environments, should eventually lose out to a system in which the fluctuations are less, for the same env. variation due to an evolution toward vanilla genotypes. I think this conclusion has all kinds of serious problems.

Yep Great Lakes World Series, and between the chief rivals of our favorite teams no less. Cubs fans will be getting friendly with this medicine, tonight after Cubs go down faster than, well…

I’m not sure that harvest is any later than usual. In fact one might argue that this year’s harvest in much of Ohio is in line with or a touch earlier than typical. I do imagine the corn harvest is slower than it might be (field conditions have been very good in many places, and equipment available) but I think this is due to fair forecasts, corn stalks in good condition, and the desire to allow the crop to dry in the field as much as it might. Corn price is very low right now and paying to dry the corn takes even more of the margin away (not to mention the carbon footprint that goes along with artificial drying).

I’m hearing reports that southern Michigan has been much wetter this fall and this will slow their harvest progress.

On the Cubs/Indians contest – I am pulling for the Chicago side in this one, but St Louis is my favorite team. At any rate, I do appreciate the link to the Edmund Fitzgerald Porter. May have to look for some locally regardless of who wins tonight (go Cubbies!).

Which team was it that hit only the 19th grand slam in WS history last night? Oh yeah, da Cubs 🙂 With one stroke of the bat Russell drove in more runs than the Tribe in all nine innings. Ok, no more gloating… lets let ’em decide it on the field.

And as for fitness (the genetic one) – I aim to dig through your last several posts and scratch my head some more on the Orr article. There is something here that seems askew to me at first blush.

OK, I hadn’t realized the importance of grain drying but that seems reasonable. So if anything, assume global has an autumnal aspect to it, we may be moving towards ever later harvests.

Fantastic WS, except for the result, and that’s a big except. Will be a while before I get over that, the Tribe being my 2nd favorite team, and not by much either. Suffice it to say that I do not accept that the better team won.

Yes! Little to do with luck. My personal bugaboo is BABIP. Supposedly every player has a number that is reflective of their career, and if they are temporarily above, it is good luck, and if they are below, it is bad luck. But BABIP is glaringly obviously related to how hard you hit the ball, and players go through cycles of good and bad hitting. It just looks like luck because the aperiodic fluctuations in demonstrated skill are so large.

On a positive note, this computer seems to have healed and I can now communicate with this site again. Woohoo!

Fully agree Matt. That has been more or less my exact argument at more than one baseball blog. You wouldn’t believe the degree to which some of these bloggers and their readers take various “sabermetric” variables as some kind of revolutionary gospel without thinking about things. More generally, noise is only noise because we don’t have the data to explain its cause. As true in baseball as it is in science.

Also, your point is great in that it has reminded me of something I want to clarify in the post–a subtle but important point on that very issue that goes beyond what I said in the piece.

Nice post. A couple of comments:

1) There has been a theoretical derivation of the Pythagorean formula. See https://arxiv.org/pdf/math/0509698.pdf

2) It seems to me that the Poisson distribution would not be an appropriate model in this context. The Poisson assumes that the events being counted are independent and that events cannot occur simultaneously. In baseball, each event is a run scored. A home run can score multiple runs at once, so therefore one event may not be not independent of the others, and multiple events can occur at the same time. The Poisson model would work better for a sport like hockey or soccer, where only one goal can be scored at a time.

Hi Steve,

Yes, I remember downloading that pre-print and looking at it then, but not reading it carefully–definitely interesting. The Weibull is a 3-parameter distribution, and can thus readily be fit, (and potentially over-fit) to many situations. Their “theoretical” derivation appears, at first take, to be just a fitting of the Weibull parameters to a different type of data (game score data) than the pythag uses (win %). Is there more to it? I also don’t believe James himself, or others working on the topic, ever tried to go down any theoretical roads, but I could be wrong.

On point two, I agree that Poisson expectations apply to independent events only. But the events here are not the individual runs scored (or allowed) during games, but rather the total number scored (or allowed) in a game, over sets of games (e.g. a season). Non-independence in such data would be evidenced by more (or less) numbers of games having particular numbers of runs scored (or allowed) than expected, given the overall means of each. If we were looking at the distribution of runs across game innings, then your point would be quite right.

Lastly, I think I may have confused things by discussing use of the Poisson, and a Monte Carlo sampling, together. Strictly speaking, to address the question James is addressing–teams winning more or less games than they “should” have–I don’t think one needs any theory at all: just perform a Monte Carlo on the individual game score results. But…if the answer is, e.g. “team X won 5 more games than it should have”, you still have the thorny issue of attribution as to

why–this is the central issue. Poisson analysis can provide insight on that.