One hundred years of NHL hockey; some analysis

This post has been updated, with corrected data and modified discussion, as detailed in the text.

Does anything say “100 Years of the National Hockey League” like say, a Tampa Bay vs Vegas matchup.  Montreal, Toronto, Ottawa?  Please; bunch of party crashers them.

In case you missed it, National Hockey League play is, today, exactly 100 years old. On December 19, 1917 the first two games in the new league had the Toronto Arenas at the Montreal Wanderers, and the Montreal Canadiens at the Ottawa Senators. This limited slate was due in large part to those being the only four teams in the league.  It turns out that the Wanderers got their first and only win in franchise history, which lasted just six games. They got past the Arenas in the common hockey score of 10-9. The Arenas, conversely, went on–along with the Canadiens–to become one of the two most storied franchises in NHL history: today’s Toronto Maple Leafs. The Senators’ first incarnation lasted until 1934, and after a 58 year absence came the second (and current) version in 1992.

So anyway, there’s hype and hoopla happening, and also discussions of the greatest seasons, teams, players, etc. As for me, I thought it would be great fun to crunch 90 years of team-season numbers to see what they indicated about team records, actual versus expected. Two minutes for tripping, and without even inhaling anything.

Continue reading

Advertisements

WAR, Pythagoras, Poisson and Skellam

Getting into some issues only makes you wish that you hadn’t, when you realize how messed up they are, at a fundamental level.

Here’s a great example involving statistical analysis, as applied to win/loss (“WL”) records of sports teams, the base concept of which is that it’s possible to estimate what a team’s WL record “should” have been, based on the number of goals/runs/points that it scored, and allowed, over a defined number of games (typically, a full season or more). This blog post by Bill James partially motivates my thoughts here.

Just where and when this basic idea originated I’m not 100 percent sure, but it appears to have been James, three to four decades ago, under the name “Pythagorean Expectation” (PE). Bill James, if you don’t know, is the originator, and/or popularizer, of a number of statistical methods or approaches applied to baseball data, which launched the so-called “SABR-metric” baseball analysis movement (SABR = Society for American Baseball Research). He is basically that movement’s founder.

In the linked post above, James uses the recent American League MVP votes for Jose Altuve and Aaron Judge, to make some great points regarding the merit of WAR (Wins Above Replacement), arguably the most popular of the many SABR-metric variables. The legitimacy of WAR is an involved topic on which much virtual ink has been spilled, but is not my focus here; in brief, it tries to estimate the contribution each player makes to his team’s WL record. In the article, James takes pointed exception to how WAR is used (by some, who argue based upon it, that the two players were basically about equally valuable in 2017). In the actual MVP vote, Altuve won by a landslide, and James agrees with the voters’ judgement (pun intended): WAR is flawed in evaluating true player worth in this context. Note that numerous problems have been identified with WAR, but James is bringing a new and serious one, and from a position of authority.

One of James’ main arguments involves inappropriate use of the PE, specifically that the “expected” number of wins by a team is quite irrelevant–it’s the *actual* number that matters when assessing any given player’s contribution to it. For the 2017 season, the PE estimates that Judge’s team, the New York Yankers, “should” have gone 101-61, instead of their actual 91-71, and thus in turn, every Yanker player is getting some additional proportion of those ten extra, imaginary wins, added to his seasonal WAR estimate. For Altuve’s team, the Houston Astros, that’s not an issue because their actual and PE WL records were identical (both 101-61). The WAR-mongers, and most self identified SABR-metricians for that matter, automatically then conclude that a team like this year’s Yanks were “unlucky”: they should have won 101 games, but doggone lady luck was against ’em in distributing their runs scored (and allowed) across their 162 games…such that they only won 91 instead. Other league teams balance the overall ledger by being luck beneficiaries–if not outright pretenders. There are major problems with this whole mode of thought, some of which James rips in his essay, correctly IMO.

But one additional major problem here is that James started the PE craze to begin with, and neither he, nor anybody else who have subsequently either modified or used it, seems to understand the problems inherent in that metric. James instead addresses issues in the application of the PE as input to the metric (WAR) that he takes issue with, not the legitimacy of the PE itself. Well, there are in fact several issues with the PE, ones that collectively illustrate important issues in statistical philosophy and practice. If you’re going to criticize, start at the root, not the branches.

The issue is one of statistical methodology, and the name of the metric is itself a big clue–it was chosen because the PE formula is similar to the Pythagorean theorem of geometry: A^2 + B^2 = C^2, where A, B and C are the three sides of a right triangle. The original (James) PE equation was: W = S^2 / (S^2 + A^2), where W = winning percentage, S = total runs scored and A = total runs allowed, summed over all the teams in a league, over one or more seasons. That is, it supposedly mimicked the ratio of squared lengths between one side, and the hypotenuse, of a right triangle. Just how James came to this structural form, and parameter values, I don’t know and likely very few besides James himself do; presumably the details are in one of his annual Baseball Abstracts from 1977 to 1988, since he doesn’t discuss the issue that I can see, in either of his “Historical Baseball Abstract” books. Perhaps he thought that runs scored and allowed were fully independent of each other, orthogonal, like the two sides of a right triangle. I don’t know.

It seems to me very likely that James derived his equation via fitting various curves to some empirical data set, although it is possible he was operating from some (unknown) theoretical basis. Others who followed him, and supposedly “improved” the metric’s accuracy definitely fitted curves to data, since all parameters (exponents) were lowered to values (e.g. 1.81) for which no theoretical basis is even possible to conceive of: show me the theoretical basis for anything that scales up/down according to the ratio of a sum of parts, and one component thereof, by the power of 1.81. The current PE incarnation (claimed as the definitive word on the matter by some) has the exponents themselves as variables, dependent on the so-called “run environment”, the total number of runs scored and allowed, per game. Thus, the exponents for any given season are estimated by R^0.285, where R is the average number of runs scored per game (both teams) over all games of a season.

Even assuming that James did in fact try to base his PE on theory somehow, he didn’t do it right, and that’s a big problem, because there is in fact a very definite theoretical basis for exactly this type of problem…but one never followed, and apparently never even recognized, by SABR-metricians. At least I’ve seen no discussion of it anywhere, and I’ve read my share of baseball analytics essays. Instead, it’s an example of the curve-fitting mentality that is utterly ubiquitous among them. (I have seen some theoretically driven analytics in baseball, but mostly as applied to ball velocity and trajectory off the bat, as predicted from e.g., bat and ball elasticity, temperature, launch angle, and etc, and also the analysis of bat breakage, a big problem a few years back. And these were by Alan Nathan, an actual physicist).

Much of science, especially non-experimental science, involves estimating relationships from empirical data. And there’s good reason for that–most natural systems are complex, and often, one simply does not know, quantitatively and apriori, the fundamental operating relationships upon which to build a theory, much less how those interact with each other in complex ways at the time and space scales of interest. Therefore one tries instead to estimate those relationships by fitting models to empirical data–often some type of regression model, but not necessarily. It goes without saying that since the system is complex, you can only hope to detect some part of the full signal from the noise, often just one component of it. It’s an inverse, or inferential, approach to understanding a system, as opposed to forward modeling driven by theory; these are the two opposing approaches to understanding a system.

On those (rare) occasions when you do have a system amenable to theoretical analysis…well you dang well better do so. Geneticists know this: they don’t ignore binomial/multinomial models, in favor of curve fitting, to estimate likely nuclear transmission genetic processes in diploid population genetics and inheritance. That would be entirely stupid, given that we know for sure that diploid chromosomes conform to a binomial process during meiosis the vast majority of the time. We understand the underlying driving process–it’s simple and ubiquitous.

The binomial must be about the simplest possible stochastic model…but the Poisson isn’t too far behind. The Poisson predicts the expected distribution of the occurrence of discrete events in a set of sample units, given knowledge of the average occurrence rate determined over the full set thereof. It is in fact exactly the appropriate model for predicting the per-game distribution of runs/goals scored (and allowed), in sports such as baseball, hockey, golf, soccer, lacrosse, etc. (i.e. sports in which scoring is integer-valued and all scoring events are positive and of equal value).

To start with, the Poisson model can test a wider variety of hypotheses. The PE can only predict a team’s WL record, whereas the Poisson can test whether or not a team’s actual runs scored (and allowed) distribution, follows expectation. To the extent that they do follow is corresponding evidence of true randomness generating the variance in scores across games. This in turn means that the run scoring (or allowing) process is stationary, i.e., it is governed by an unchanging set of drivers. Conversely, if the observed distributions differ significantly from expectation, that’s corresponding evidence that those drivers are not stationary, meaning that teams’ inherent ability to score (and/or allow) runs is dynamic–they change over time (i.e. between games). That’s an important piece of knowledge in and of itself.

But the primary question of interest here involves the WL record and its relationship to runs scored and allowed. If a team’s runs scored and allowed both closely follow Poisson expectations–then prediction of the WL record follows from theory. Specifically, the distribution of differences in two Poisson distributions follows the Skellam distribution, described by the British statistician J.G. Skellam in the 1950s, as part of his extensive work on point processes. That is, the Skellam directly predicts the WL record whenever the Poisson assumptions are satisfied. However, even if a team’s run distribution deviates significantly from Poisson expectation, it is still possible to accurately estimate the expected WL record, by simply resampling–drawing randomly several thousand times from the observed distributions–allowing computers to do what they’re really good at. [Note that in low scoring sports like hockey and baseball, many ties will be predicted, and sports differ greatly in how they break ties at the end of regulation play. The National Hockey League and Major League Baseball vary greatly in this respect, especially now that NHL ties can be decided by shoot-out, which is a completely different process than regulation play. In either case, it’s necessary to identify games that are tied at the end of regulation.]

If instead you take an empirical data set and fit some equation to those data–any equation, no matter how good the fit–you run the risk of committing a very big error indeed, one of the biggest you can in fact make. Specifically, if the data do in fact deviate from Poisson expectation, i.e. non-stationary processes are operating, you will mistake your data-fitted model for the true expectation–the baseline reference point from which to assess random variation. Show me a bigger error that you can make then that one–it will affect every conclusion you subsequently come to. So, if you want to assess how “lucky” a team was with its WL record, relative to runs scored and allowed, don’t do that. And don’t get me started on use of the term “luck” in SABR-metrics, when what they really mean is chance, or stochastic, variation. The conflation of such terms in sports that very clearly involve heavy doses of both skill and chance, is a fairly flagrant violation of the whole point of language. James is quite right in pointing this out.

I was originally hoping to get into some data analysis to demonstrate the above points but that will have to wait–the underlying statistical concepts needed to be discussed first and that’s all I have time for right now. Rest assured that it’s not hard to analyze the relevant data in R (but it can be a time-consuming pain to obtain and properly format it).

I would also like to remind everyone to try to lay off high fastballs, keep your stick on the ice, and stay tuned to this channel for further fascinating discussions of all kinds.  Remember that Tuesdays are dollar dog night, but also that we discontinued 10 cent beer night 40 years ago, given the results.

What was it, roughly, that we were thinking there, if anything? Part two.

So, there was a high school class reunion a few months back, and it’s World Series time again, so now seems a good time for an overdue, second episode of our series of the above title. In episode one, I explored an incident involving sub-optimal decision making in high school so I think I’ll just continue on that theme here.

I saw a number of old classmates, and teammates, at the reunion. I think class reunions are great. They can cause one to reflect on really important topics, such as the passage of time, or the nature of life’s changes. Or to even deeper things. Explosives for example. Just what “loud” really entails. The nature of stupidity.

It seems that I had become aware that personal fireworks were legal in the next county, and had thus traveled the 40 miles to obtain a few dozen “M-80” fireworks, ostensibly for use during the Fourth of July. It also seems that sometime later, my friend Steve and I found ourselves parked in front of our friend Doug Brown’s house after dark, with said bag of M-80s and a lighter. Now, an M-80, we’d been told, contained the equivalent gunpowder of a quarter stick of dynamite, which I thought was pretty impressive but did no actual testing of. If one of these things goes off on, say, someone’s front porch, it would not typically go unnoticed, and that concept did seem, to us, worthy of some testing at that particular time.

It additionally seems that I was the driver and Doug’s house was off to our left. The plan, which I think we put a solid 30 seconds of thought into, was that we would launch one of these onto Doug’s porch–about maybe 75 ft away–while seated in the vehicle, so as to effect a prompt getaway. We came up with a fair and efficient division of labor in which Steve would light the fuse and hand the thing to me–I would then fire it toward the porch and immediately hit the gas, making ourselves rapidly scarce. It was a great plan as far as I was concerned: all I had to do was throw and floor it, whereas Steve had the equivalent of six to eight sticks of dynamite in a bag on his lap, with an open flame in his hand. This struck me as equitable, given that I was providing the vehicle and the right arm.

So…what’s the baseball connection here, you may wonder. Well, I played shortstop in high school, whereas Steve didn’t, and so it was logical that I should do whatever throwing was involved. Shortstop is a fun position, because you get to sprint to chase down ground balls, and then watch the first baseman sprint to chase down the throw you just sailed some distance beyond him. Now, 70-80 feet is a lot shorter than a typical throw from shortstop to first base…but an M-80 is also a lot lighter than a baseball. So I knew I should put some mustard on it to insure getting it at least somewhere near the porch. Being quite experienced at firing balls into the adjacent woods from deep short, I wasn’t too worried about it. If the M-80 banged off the front of the house first or whatever, no big deal, I mean assuming nobody opened the front door at the wrong instant.

Now may be a good time to remind ourselves of the importance of taking all potentially relevant variables into consideration–apriori even–in events like these. And do we think enough about the tangible value of trial runs? Probably some room for improvement there too.

Anyway, Steve successfully got said firework lit without blowing us up, and the ensuing exchange to me was also flawless. With right arm extended and a good five seconds or so to work with, I eyed Doug’s porch and applied my best Nolan Ryan fastball to the explosive. Now, I think it’s fair to say that (1) the average person is just not that aware of exactly where one’s car door meets one’s car roof, (2) that I qualify as quite average in that context, and (3) that that specific location took on above average significance, in that particular situation. In short, when my hand was just about to send said explosive device on it’s planned trajectory, said hand was inadvertently applied, with considerable force, to said vehicular location, and separated from said device, thereby placing the latter on a trajectory not nearly as likely to achieve the original objective. This in turn would necessitate a rapid adjustment in plan and action, not to mention vocalization.

This is more or less a science blog, and I ask you, are many topics more fascinating, really, than the physics of acoustics under confinement? Maybe heredity–I find that interesting too. Also, involuntary reflexes, impromptu vocalizations: good stuff. How about hand-eye coordination under duress? Personal safety and survival? Blood?
Bodily dismemberment? All topics worthy of consideration when you get down to it. Let’s explore some of these for just a moment.

Acoustic physics, let’s take. As we know, Newton’s First Law of Loud, states “Any acoustically active device, placed under spatial confinement, will manifest even more of its acoustic characteristics, in fact quite a lot more than you’d think just from theory alone”. Take spatial relationships: just how much room for rapid bodily movement is there, really, in the front passenger seat of a typical car? How can humans maximize movement efficiency in response to active, explosive devices experiencing random trajectories?

Now back to our story. To cut to the chase, upon hand-car impact, our active device–the one under current discussion–experienced a rather sudden change of x coordinate velocity–one markedly away from Doug’s front porch, opposite that really, which is to say in the general direction of one Steve. More specifically, toward Steve’s male-specific, hereditarily significant anatomy. And there it landed, for a brief moment. Although entirely stunned, and with my hand feeling possibly broken, I was still able to collect myself, breathe a sigh of relief and comment on just how fortunate we were, really, that said M-80 had not taken an alternate trajectory and landed instead, in or near the bag of 30 or so other M-80s, within the confines of our vehicle, in which we too were present, due to our plan, in the street in front of our friend Doug Brown’s house. Steve also reflected for a bit on this fortunate state and concurred that such an outcome would have been potentially problematic on several counts, not the least of which was just how autopsies and identifications based on scattered body parts are conducted.

The preceding is not in fact what transpired at that moment.

Rather, Steve executed what I think to this day is the most rapid series of body movements I’ve ever seen from a human being, with the possible exception of the time I scrambled up and over a rock to find my neck about three feet directly in front of the head of a large rattlesnake. Conscious thought was not part of the process. M-80s had fuse times of roughly six to seven seconds, going strictly from memory. I’d guestimate that at this point, about four of those remained. As I recall it, there were, in order (1) an involuntary yell, (2) a ceiling-constrained jump upward, and (3) a failed attempt to flick the thing, by a backhand motion, away from where it resided. This process took maybe two seconds, maximum, and led to another entirely frantic attempt–panicked would work–which succeeded in flinging the thing down towards Steve’s feet. This, very fortunately, was not where the bag of other M-80s had been placed, and additionally, it’s one thing to have your feet blown off but quite another to have your evolutionary lineage ended.

Down there our device detonated, with a flash, maybe 1 to 1.5 seconds later.

What M-80 detonations lack in duration and beauty of light display they make up for in sheer decibels; they aren’t fireworks so much as small bombs. This was the most unbelievably loud thing I’d ever heard, and that includes seeing Ted Nugent in the old concrete and steel Toledo Sports Arena (also with Steve). It was concussive. Steve told me he basically could not hear for several days. The car was immediately filled with an acrid cloud of sulfurous smoke. My hand felt very possibly broken. I could neither hear nor see, and my first thought was “We gotta get out of here right NOW, before Doug comes out and sees this”. Or even worse, his dad, with a possible call to the police. But even in the best of circumstances, it’s not easy to go straight from Nolan Ryan to Mario Andretti, quickly. I could not see without sticking my head out the window, which I did until the breeze created cleared out the cab. I’m not sure that Steve knew exactly what had happened or even where he was, but didn’t have time to investigate. I was pretty sure he was alive and that would have to be good enough for the moment.

I think the evening’s festivities were concluded with this event, although I wouldn’t necessarily place money on that either. If Doug is reading, I’d like to formally apologize for the rubber patch laid in front of his house and any subsequent effect on property values that may have resulted.

Thanks for reading and please stay tuned for the next episode, in which we’ll explore how surprisingly inconvenient cul-de-sacs can be in certain circumstances, and/or other fascinating topics.

SABR-toothed

Well they’ve been running around on the flat expanses of the early Holocene lake bed with impressively large machines, whacking down and gathering the soybeans and corn. This puts dirt clods on the roads that cause one on a road bike at dusk to weave and swear, but I digress. The Farmer’s Almanac indicates says that it must therefore be about World Series time, which in turn is just about approximately guaranteed to initiate various comments regarding the role of luck, good or bad, in deciding important baseball game outcomes.

There are several important things to be blurted out on this important topic and with the Series at it’s climax and the leaves a fallin’ now’s the time, the time is now.

It was Bill James, the baseball “sabermetric” grandpa and chief guru, who came up with the basic idea some time ago, though not with the questionable terminology applied to it I think, which I believe came later from certain disciples who knelt at his feet.

The basic idea starts off well enough but from there goes into a kind of low-key downhill slide, not unlike the truck that you didn’t bother setting the park brake for because you thought the street grade was flat but found out otherwise a few feet down the sidewalk. At which point you also discover that the bumper height of said truck does not necessarily match that of a Mercedes.

The concept applies not just to baseball but anything involving integer scores. Basic idea is as follows (see here). Your team plays 162 baseball games, 25 soccer matches or whatever, and of course you keep score of each. You then compute the fraction S^x/(S^x + A^x), where using the baseball case, S = runs scored, A = runs allowed and x = an exponent that varies depending on the data used (i.e. the teams and years used). You do this for each team in the league and also compute each team’s winning percentage (WP = W/G, where W = number of wins and G = games played in the season(s)). A nonlinear regression/optimization returns the optimal value of x, given the data. The resulting fraction is known as the “pythagorean expectation” of winning percentage, claiming to inform us of how many games a given team “should” have won and lost over that time, given their total runs scored and allowed.

Note first that the value of x depends on the data used: the relationship is entirely empirically derived, and exponents ranging from (at least) 1.8 to 2.0 have resulted. There is no statistical theory here whatsoever, and in no description of “the pythag” have I ever seen any mention of such. This is a shame because (1) there can and should be, and (2) it seems likely that most “sabermatricians” don’t have any idea as to how or why. Maybe not all, but I haven’t seen any discuss the matter. Specifically, this is a classic case for application of Poisson-derived expectations.

However the lack of theory is one, but not really the main, point here. More at issue are the highly questionable interpretations of the causes of observed deviations from pythag expectations, where the rolling truck smashes out the grill and lights of the Mercedes.

You should base an analysis like this on the Poisson distribution for at least two very strong reasons. First, interpretations of the pythag always involve random chance. That is, the underlying view is that departures of a given team’s won-loss record from pythag expectation is always attributed to the action of randommness–random chance. Great, if you want to go down that road, that’s exactly what the Poisson distribution is designed to address. Secondly, it will give you additional information regarding the role of chance that you cannot get from “the pythag”.

Indeed, the Poisson gives the expected distribution of integer-valued data around a known mean, under the assumption that random deviations from that mean are solely the result of sampling error, which in turn results from the combination of Complete Spatial Randomness (CSR) complete randomness of the objects, relative to the mean value and the size of the sampling frame. In our context, the sampling frame is a single game and the objects of analysis are the runs scored, and allowed, in each game. The point is that the Poisson is inherently designed to test just exactly what the SABR-toothers are wanting to test. But they don’t use it–they instead opt for the fully ad-hoc pythag estimator (or slight variations thereof). Always.

So, you’ve got a team’s total runs scored and allowed over its season. You divide that by the number of games played to give you the mean of each. That’s all you need–the Poisson is a single parameter distribution, the variance being a function of the mean. Now you use that computer in front of you for what it’s really ideal at–doing a whole bunch of calculations really fast–to simply draw from the runs scored, and runs allowed, distributions, randomly, say 100,000 times or whatever, to estimate your team’s real expected won-loss record under a fully random score distribution process. But you can also do more–you can test whether either the runs scored or allowed distribution fits the Poisson very well, using a chi-square goodness-of-fit test. And that’s important because it tells you basically, whether or not they are homogeneous random processes–processes in which the data generating process is unchanging through the season. In sports terms: it tells you the degree to which the team’s performance over the year, offensive and defensive, came from the same basic conditions (i.e. unchanging team performance quality/ability).

The biggest issue remains however–interpretation. I don’t how it all got started, but somewhere, somebody decided that a positive departure from “the pythag” (more wins than expected) equated to “good luck” and negative departures to “bad luck”. Luck being the operative word here. Actually I do know the origin–it’s a straight forward conclusion from attributing all deviations from expectation to “chance”. The problem is that many of these deviations are not in fact due to chance, and if you analyze the data using the Poisson as described above, you will have evidence of when it is, and is not, the case.

For example, a team that wins more close games than it “should”, games won by say just one or two runs, while getting badly smoked in a small subset of other games, will appear to benefit from “good luck”, according to the pythag approach. But using the Poisson approach, you can identify whether or not a team’s basic quality likely changed at various times during the season. Furthermore, you can also examine whether the joint distribution of events (runs scored, runs allowed), follows random expectation, given their individual distributions. If they do not, then you know that some non-random process is going on. For example, that team that wins (or loses) more than it’s expected share of close games most likely has some ability to win (or lose) close games–something about the way the team plays explains it, not random chance. There are many particular explanations, in terms of team skill and strategy, that can explain such results, and more specific data on a team’s players’ performance can lend evidence to the various possibilities.

So, the whole “luck” explanation that certain elements of the sabermetric crowd are quite fond of and have accepted as the Gospel of James, may be quite suspect at best, or outright wrong. I should add however that if the Indians win the series, it’s skill all the way while if the Cubs win it’ll most likely be due to luck.

On throwing a change up when a fastball’s your best pitch

Sports are interesting, and one of the interesting aspects about them, among many, is that the very unlikely can sometimes happen.

The Louisville Cardinals baseball team went 50-12 this year through the regular season and first round (“regional”) of the NCAA baseball playoff. Moreover, they were an astounding 36-1 at home, the only loss coming by three runs at the hands of last year’s national champion, Virginia. Over the last several years they have been one of the best teams in the country, making it to the College World Series twice, though not yet winning it. They were considered by the tournament selection committee to be the #2 team in the country, behind Florida, but many of the better computer polls had Louisville as #1.

The college baseball playoff is one of the most interesting tournaments out there, from a structural perspective. Because it’s baseball, it’s not a one-loss tournament, at any of the four levels thereof, at least since 2003. Those four levels are: (1) the sixteen regionals of four teams each, (2) the eight “super regionals” determined by the regional champs, and (3) two rounds at the College World Series in Omaha, comprised of the eight super regional champs. A team can in fact lose as many as four games total over the course of the playoff, and yet still win the national championship. It’s not easy to do though, because a loss in the first game, at either the regional level, or in round one of the CWS, requires a team to win four games to advance, instead of three. In the 13 years of this format, only Fresno State has pulled that feat off, in 2008.

In winning their regional and being one of the top eight seeds, Louisville hosted the winner of the Nashville regional, which was won in an upset over favorite Vanderbilt, by UC Santa Barbara of the Big West Conference. That conference is not as good top to bottom as is the Atlantic Coast Conference (ACC) that Louisville plays in, but neither is it any slouch, containing perennial power CSU Fullerton, and also Long Beach State, who gave third ranked Miami fits in its regional. More generally, the caliber of the baseball played on the west coast, including the PAC-12 and the Big West, is very high, though often slighted by writers and pollsters in favor of teams from the southeast (ACC and Southeast (SEC) conferences in particular). Based on the results of the regional and super regional playoff rounds, the slighting this year was serious: only two of the eight teams in the CWS are from the ACC/SEC, even though teams from the two conferences had home field advantage in fully 83 percent (20/24) of all the first and second round series. Five schools west of the Mississippi River are in, including the top three from the Big 12 conference.

In the super regional, the first team to win twice goes on to the CWS in Omaha. To make a long and interesting story short, UCSB won the first game 4-2 and thus needed just one more win to knock out Louisville and advance to the CWS for the first time in their history. Down 3-0, in the bottom of the ninth inning, they were facing one of the best closers in all of college baseball, just taken as the 27th overall pick in the MLB amateur draft by the Chicago White Sox. Coming in with 100+ mph fastballs, he got the first batter out without problem. However, the second batter singled, and then he began to lose his control and he did exactly what you shouldn’t do: walked the next two batters to load the bases. The UCSB coach decided to go to his bench to bring in a left-handed hitting pinch-hitter, a freshman with only 26 at-bats on the season, albeit with one home run among his nine hits on the year.

And the rest, as they say, is history:

(All the games from this weekend are available for replay here)

Ask the experts, part n

Well we’re long overdue for another installment of “Ask the Self-Appointed Experts“, or at least for the question part. In today’s edition a follower from Two Forks Montana wrestles with the following conundrum, inviting others to help, or at least reassure him that he is not confused alone. He writes:

I know this issue is all over AM talk radio but the inmates pretty clearly run the asylum there and I’m more confused on the following issue than ever.

It is well known that, given a known rate process, the gamma distribution defines the expected values from random starting points to the nth “nearest” object, and so inversely, we can estimate unknown rates by such values. For example, in a forest of randomly distributed trees, the circular area, a, defined by the distance to the nth closest tree, will estimate tree density. But as Skellam (1952), Moore (1954) and Pollard (1971) showed analytically, these estimates are biased, in inverse magnitude to the value of n, specifically, as n/(n-1) for n > 1. Thus, the distance to, say, the 2nd closest tree will correspond to the area represented by one tree, not two. All very well and good.

Now, the mean of the integration of the gamma distribution from 0 to 1, for a known rate, should return the mean area a, but when I closely approximate the integral (in R, which can’t integrate), I seem to get bias-corrected values reflecting the rates, rather than the biased values reflecting the areas (a) to the nth object. I’m flummoxed and not a little aggravated. Do they know what they’re doing there at R headquarters, or is it me that’s got all turned about the wrong way? If I can’t even trust the values from the statistical distributions in R, then just what can I trust there? I tried taking my mind off the matter by following the Winnipeg Jets (WJPCA, 2015), but man, one can just take only so much of that and I sure as hell ain’t going to follow Edmonton. The ice fishing seems to help, at least until the alcohol wears off, but really there should be an 800 number I think. If you can put me onto some clues I would be most grateful.

My R code and results (for the n = 2 object):

probs = seq(from = 0.000001, to = 0.999999, by = 0.000001)		# evenly spaced probability steps
mean.area = mean(qgamma(p=probs, shape=2, rate = 1, lower.tail = T))	# approximate the pdf integral, as the mean of the sampled distribution
[1] 1.999993

1.999993, WTF !??!

References:
Skellam, J.G. 1952. Studies in statistical ecology: I. Spatial Pattern. Biometrika 39:346-362
Moore, P.G. 1954. Spacing in plant populations. Ecology 35:222-227.
Pollard, J.H. 1971. On distance estimators of density in randomly distributed forests. Biometrics 27:991-1002
WJPCA, 2015. Your 2015-2016 Winnipeg Jets: Fast, Furious and Fun. Winnipeg Jets Promotional Coordinating Association.

Sincerely,
Stewart Stansbury, Two Forks MT (“not quite Canada but roughly as empty”), USA

p.s. Jets picture enclosed

Jets player checking imaginary opponent hard into the glass, to the delight of all

Jets player checking imaginary opponent hard into the glass, to the delight of all

Gauley time!

Black cherry trees in ripe fruit and goldenrod in full bloom and that can only mean one thing, and no I’m not talking about football season.

They opened the Summerfield Dam gates at 7AM this morning; for godsake get down, or up, or over there in the next six weeks and try to kill yourself with all the others if you can. There will be a party, a rather large and extended one and it’s anybody’s guess at to whether river flow will exceed that of beer. Now, when on the river, try to remember, apriori if possible, that plastic (or rubber) side down is optimal, that rocks are typically fairly hard and to take a big gulp of air before you go under. Remembering these aposteriori is fairly automatic. Everything else is open to personal interpretation.

Best to put in downstream from the nozzle a bit, although I’m sure it’s been tried:
Gauley opentunnel

What made America great:
Gauley_pink dory at pillow rapid

This is probably sub-optimal form:

Definite sub-optimal form:
Gauley_poor form

This can be made to work for a while, like 12 seconds:
Gauley mattress

Really excellent form:
Gauley Boof

gauley_unload

Aesculus glabra!

Ezekiel Elliott, Ohio State

Ezekiel Elliott, Ohio State, breaks through the line in Ohio State’s NCAA football championship game victory over Oregon Monday night, capping an improbable run to the title in the first year of the college football playoff. Photo by Kirby Lee, USA TODAY sports

Awesome Buckeyes, just plain awesome.
Enough said.

Three out of five!

Third baseman Pablo Sandoval hits the ground after catching a foul pop fly for the last out of the 2014 World Series, as the Giants erupt from their dugout.

Third baseman Pablo Sandoval hits the ground after catching a foul pop fly for the last out of the 2014 World Series, as the Giants erupt from their dugout.

World Series champs for the third time in the last five years (every other year), those “scratch ’em ’till they bleed to death” San Francisco Giants have done it again. Not quite a dynasty yet, but you have to go back fifteen years or so to find a team better at consistently winning games when they really count, over several years, than does this group of characters. When all was said and done, it came down to having the best World Series pitcher in a long, long time on your side.

For the record, I picked the Giants in six. Matt also actually picked the Giants but then went with his “logical opposites theory” to go with the Royals. Harold and Clem, well they were just patently off the deep end 🙂

Predictions for 2015 and 2016 are now open. For 2015 I’m picking anyone except the Giants, and for 2016, I’m going Giants 🙂

Who’s “best” and how do you know it?

So suppose you have your basic Major League Baseball (MLB) structure, consisting of two leagues having three divisions of five teams each, each of which plays a 162 game, strongly unbalanced*, schedule. There are, of course, inherent quality differences in those teams; some are better than others, when assessed over some very large number of games, i.e. “asymptotically” **. The question thus arises in your mind as you ponder why the batter feels the need to step out of the batter’s box after each pitch ***: “how often will the truly best team(s) win their league championships and thus play each other in the World Series”. The current playoff structure involves having the two wild card teams play each other in a one game elimination, which gives four remaining playoff teams in each league. Two pairings are made and whoever wins three games advances to the league championship series, which in turn requires winning four games.

I simulated 1000 seasons of 162 games with leagues having this structure. Inherent team quality was set by a normal distribution with a mean of 81 wins and a standard deviation of ~7, such that the very best teams would occasionally win about 2/3 (108) of their games, and the worst would lose about that same fraction. Win percentages like those are pretty realistic, and the best record in each league frequently falls between 95 and 100 wins.

Results:
1) The truly best team in each league makes the playoffs about 80 percent of the time under the current system, less when only four teams make it.
2) That team wins its league championship roughly 20 to 30 percent of the time, getting knocked out in the playoffs over half the time. It wins the whole shebang about 10 to 15 percent of the time.
3) Whenever MLB expands to 32 teams, in which the playoff structure will very likely consist of the four division winners in each league and no wild card teams, the truly best (and second and third best) teams in each league will both make the playoffs, and advance to the World Series, less frequently than they do now.

This type of analysis is generalizable to other types of competitions under structured systems, at least for those in which the losers of individual contests live to fight another day, or if they don’t, are replaced by others of the same basic quality. The inherent spread in team quality makes a very big difference in the results obtained however. It’ll apply very well to baseball and hockey, but not so well to the NBA, for example.

So the next time an MLB team wins it’s league, or the World Series, and you’re tempted to think this means they must be the best team in the league (or MLB overall), think about that again. Same for the NHL.

* Currently, each team plays around 3 times as many games against each intra-division opponent as inter-division opponents, not even including the 20 inter-league games (which I’ve ignored in these analyses, assuming all games are within-league).
** These records are conceived of as being amassed against some hypothetical, perfectly average team. This team is from Lake Wobegon Minnesota.
*** It is perfectly OK to think other things of course, and we need not worry about the particulars of the language embodied therein.

On baseball (finally!)

I’ve discussed no baseball here yet, which is kind of surprising, given that I’ve been a big fan all my life. I played a lot growing up, through high school and even a little in college and afterwards. If I had the time, I would likely start a blog just devoted strictly to baseball (and not just analysis either), because I have a lot to say on a lot of topics. But alas…

To me, the real interest in any sport comes from actually playing the game, not watching it, and I watch very little baseball (now) because the games are just too time consuming (though I still have a hard time refraining in October). When I do watch, I’m not obsessively analytical–that takes the fun out of it for me. It’s an athletic contest, not a statistics class; I want to see the center fielder go full speed and lay out for a catch, or a base thief challenge the pitcher or whatever, not sit there with numbers in my head. Analysis is for later, and I do like it, so I wade in at times, thereby joining the SABR-metric (or “sabermetric”) revolution of the last 3-4 decades (the Society for American Baseball Research (SABR), initiated much of this). And baseball offers endless analytical opportunities, for (at least) two reasons.

Billy Hamilton of the Cincinnati Reds

Billy Hamilton of the Cincinnati Reds

Continue reading

A weird incident

Yesterday I had an interesting experience which I’m not sure how to fully interpret.

I got hit and knocked down by a large SUV while on my bike ride. I’ve ridden unknown thousands of miles in my life and this is the first time I’ve ever been hit. It happened in an unusual way; most riders get hit from behind by a vehicle moving ~ near the speed limit. I was lucky–even though I got broadsided from the left, the vehicle was only going maybe 5-7 mph (but accelerating), and I was just starting from a cold stop, barely moving. But I was also in the act of clipping into the pedals, and thus not freely mobile. I was however able, given that I was looking straight at the oncoming vehicle, to turn slightly to the right and get my left hand off the handlebars just enough to prevent a more serious collision. The impact spun me around about 270 degrees and I landed on my left side. What happened next was the interesting part though.

I wasn’t hurt but was stunned and laid on the ground for a few seconds trying to comprehend what had happened. Cars were lined up at a red light and one of drivers yelled out and asked if I was OK. I said yeah I thought so, although I wasn’t 100% sure. I saw the SUV pull over–no chance for a hit and run incident at a red light with clear witnesses. Then I see someone with some type of badge on their shirt, though not in a police uniform, walk up to me and say “What do you need”? Paramedic, already? I’m still trying to unclip my right foot from the pedal so I can get up off the roadway, which I finally do.

As I get up I notice a gun on his hip and then realize this is the person who hit me. FBI agent, unmarked car [correction: it was a Homeland Security agent]. I sort of spontaneously say something like “What the hell are you doing you idiot, didn’t you see me?“, among other things. His first response is “You’re supposed to cross the street at the crosswalk up there”. Obvious nonsensical bullshit; we were both emerging from parking lots, on opposite sides of the road, and trying to initiate left hand turns onto the road. We were both in the roadway, and he just simply wasn’t watching, presumably looking over his shoulder to see if there was any traffic coming. I’m just lucky the light 30 m away was red and therefore he didn’t accelerate even more.

The several witnesses to the incident were now departing and I realized immediately that this guy was going to try to deny any responsibility. What I said next is more or less unprintable, FBI agent and gun or no. He said some other nonsense, mainly that he was in fact watching where he was going, the logical conclusion from that being that he must then have hit me on purpose, which we can be pretty sure an FBI agent would not do. I was busy inspecting my bike, which since it took the brunt of the collision, I was sure must be damaged. It’s a LeMond, which went out of business several years ago due to Trek/Armstrong’s reaction to LeMond’s doping allegations against Armstrong. So getting a replacement frame is limited to what you can find on E-bay and similar sites, and also expensive. Amazingly, and much to my great relief, the bike did not appear to suffer any obvious structural damage. The front wheel wasn’t even out of true. Apparently the impact point had been the left ram-horn of the handlebars, and it just flipped me around. Hairline micro-fractures in the frame are still a possibility though; these will only become apparent once they propagate and grow under riding stresses.

The Sheriff showed up about 15 minutes later and filled out a report. He seemed like a good guy, and sympathetic to my version of events, but nevertheless he refused to assign fault to the driver, saying something to the effect that the party further out into the roadway–which was the driver–has the right of way. I don’t think this is correct for a couple of reasons, but there was nothing I could do, given that any witnesses were gone. I was just so glad that neither my bike nor I were damaged that I just didn’t want to press it. Plus there was only about an hour of daylight left and I just wanted to get back on and ride, which is what I did. I even shook the agent’s hand before leaving, which kind of surprised me actually.

But it’s incidents like this, among many others, that make me increasingly suspicious of the trustworthiness of human beings generally. On the other hand, it makes me think of friend Alan Reinbolt, who only a couple of years after I did mine, was hit and killed by a large truck on his cross-the-country bike ride, and the two bikers who’ve already been killed in the county by drivers this year. In those contexts, I’ve been very fortunate indeed.