If you’re a college football follower, you already know that one of the greatest games you will likely ever see was played Saturday. This past weekend was “rivalry weekend” wherein many teams with historical rivalries played each other–Ohio State vs Michigan, UCLA vs USC, etc. One of the most intense of these rivalries is Auburn vs Alabama. These two teams have produced the last four national champions, including the last two by Alabama; they entered the game with Alabama ranked first, and Auburn fourth, in the BCS (“Bowl Championship Series”) rankings. They played Saturday at Auburn, in a spectacular game. Auburn won, 34-28, with a miraculous finish, most likely knocking Alabama out of a chance for a third consecutive national championship. In the process, it also kept itself alive and also opened the door for Ohio State, who moved into the #2 spot after winning their own dramatic game against Michigan.
But this post is actually much more about math, specifically that involved in the BCS ranking system, how the college football championship game is determined, and about what happened in 2008 and could happen again this year. [I spent too much time on this, but I couldn’t resist]
The BCS system is a somewhat byzantine, much-maligned algorithm that’s been in place for about 15 years now. For whatever reason, there is no playoff in division 1-A football (there is in all other college football divisions however). Not only that, but most teams’ schedules are 12 games, and there are nowhere near enough games played to determine objectively, statistically, who the best team is. But everybody needs to have a champion it seems, no matter how determined, and so a “BCS National Championship Game” (BCS NCG) was devised, with the two teams playing in it determined by a unique rating algorithm. It attempts to integrate human opinion with computer algorithms, ranking all teams after each week’s games are complete; whoever ranks first and second at season’s end plays in the BCS NCG, held in early January.
A team’s BCS ranking is derived from equal weightings (1/3 each) from two different human polls and the mean of a set of six computer ranking systems. Each of the three components spits out a ranking for each team that ranges between 0.0 and 1.0. One of the human polls (the “Harris Poll”) has over 100 voters (ex-coaches, players, broadcasters, etc), while the other (the USA Today, or “Coaches” poll) is comprised of a set of about 60 voters, all active football coaches. The six computer systems are a subset of a much larger set of such systems that exist, sometimes derived from fantasy football or gambling interests, and varying wildly in assumptions, computational details and the like.
I have no idea how these particular voters and computer systems were selected for inclusion, a hugely important issue by itself, but not one that I can address. What constraints/rules are placed on these voters, if any, are not clear, and it most certainly is not clear who actually follows them if they do exist–each voter can make his own subjective decision based on whatever criteria he/she thinks is important, including or excluding such critical criteria as the timing of losses, strength of schedule, etc., as they see fit. How the computer systems work is by no means clear either–algorithms for some are entirely secret whereas others have some degree of explanation attached at their web sites. This reality is pretty strange when you consider that the whole point of including computer rankings in the first place was to make things more open and supposedly “objective”.
Well, whatever, I’m not in charge of the thing.
Each voter ranks the top 25 teams only, inversely (i.e. the #1 ranked team gets 25 points and the #25 team gets 1 point). These integer points are then summed across all voters, for each team, and this sum is then divided by the maximum possible number of points a team could get, which is simply (25 * n), with n = number of voters. For the six computer systems, each spits out a real number, which is then converted to a simple rank, and point value, just like in the two human polls. For each team, the high and low computer ranks are thrown out, and the remaining four are averaged. The three rankings from the three components are then averaged, giving the final BCS score for each team.
Now the subtlety, and how some simple mathematical considerations of the BCS ranking system essentially screwed the University of Texas out of a shot at the national championship in 2008, in favor of Oklahoma. The BCS system gets criticized more or less, every year, but 2008 was a very definite fiasco from several angles.
That year, the Big 12 conference had three outstanding teams, and all in it’s South Division no less: Texas, Texas Tech, and Oklahoma. All ended with 11-1 records, a three way tie. The Big 12 had in place a tie-breaker system to determine the division winner, but after going through the first four tie-breakers, Texas and Oklahoma were still tied! So, on to the fifth tie-breaker, which was the teams’ BCS ranks…and in that, Oklahoma was ever so slightly ahead of Texas. Therefore Oklahoma was declared the division winner, which meant they would play the North Division winner, Missouri, in the Big 12 conference title game.
There’s far more to it than just playing in that game though, because Oklahoma and Texas were ranked second and third in the national BCS rankings at the time (see here). This meant that whoever played Missouri in that game–assuming they would win it, which was likely since both teams were pretty clearly better than Missouri–that team would subsequently play for the national championship, while the other team would not. For Oklahoma this is obvious, because they were already #2, but for #3 Texas, they would very likely have jumped over Oklahoma by virtue of playing and winning another game against a good team. So, the Big 12’s tiebreaker rules essentially determined who played in the BCS NCG. So, Oklahoma played Missouri, beat them, rose to #1 in the BCS rankings when #1 Alabama lost, and went on to the BCS NCG, which they lost to Florida 24-14.
Texas simply never got a shot at the national title–they just sat there idle at #3, while #4 Florida beat #1 Alabama in the SEC title game and #2 Oklahoma beat Missouri in the Big 12 title game. Texas and their fans were outraged, and for good reason, because they had in fact beaten Oklahoma (their hated rival), on a neutral field during the season! Here, I’m going to add an additional reason for them to be upset–one never raised at the time–to my knowledge–and based on some simple but not-so-obvious mathematics of the BCS computation.
Before the 2008 Big 12 title game, the two human polls were split on whether Texas or Oklahoma was better, but it was extremely close in both cases. The coaches’ poll had it 0.9161 to 0.9154 in favor of Oklahoma, a difference of just 0.0007. The Harris poll had it 0.9115 to 0.9094 in favor of Texas, a difference of 0.0021. Since the weights of the two polls are equal, and the Texas advantage in the Harris poll was greater than their disadvantage in the coaches’ poll, this meant Texas had a razor’s edge lead and thus that the final verdict would almost certainly rest on what the computers had to say.
Now for the nitty gritty math subtlety, and it involves the varying measurement resolution of the three BCS ranking components. The Harris poll had 114 voters at the time, and with an integer point scale and maximum possible score of 25 from any single voter, you have an inherent measurement resolution of roughly 1/2825, or 0.0003. For the coaches poll, there were 61 voters and the same point scale, so the resolution is roughly 1/1525, or 0.0006, slightly coarser. But for the computers….for the computers there are only 4 effective “voters”, because there are only six to start with and the high and low scores are thrown out. So the measurement resolution is 1/(4*25), i.e. 1/100–a much coarser resolution than for either of the human polls.
The upshot of this is that, in both of the human polls, there are so many voters that if somebody ranks Team A ahead or behind Team B by a rank or two, it doesn’t matter much in those teams’ final scores. But in the computer ranking, it can in fact matter a great deal. And that is exactly what happened between Texas and Oklahoma. The mean of the computer rankings had it 0.980 to 0.940, in favor of Oklahoma, and this difference of 0.040 far exceeded Texas’ slight advantage in the two human polls, and thus gave the final nod to Oklahoma.
But that 0.04 difference was not accurate–it resulted largely from converting the raw computer scores to integer ranks. The raw computer scores that are spit out are real numbers, i.e. they go out to several decimal places, inherent in the algorithms used–and if you use those values instead you get a different picture entirely. Now, there is no way to know the theoretical maximum value for the raw values in the computer scores (because the algorithms are unknown), so it’s not possible to standardize these raw scores to the same 0 to 1 scale that is possible when using integer ranks, as the BCS does (described above). But I can easily get around this problem by simply putting all three BCS component scores on a ratio basis, one school’s value divided by the other’s (e.g. Texas/Oklahoma), since I only need to compare these two teams (and still throwing out the computers’ high and low values). And when I do so, I get the following ratios, to four decimal places, for each BCS component:
Harris Poll: 0.9992
Coaches’ Poll: 1.0023
If just use the BCS computer scores, as computed by the BCS, I get a TX/OK ratio of 0.94/0.98 = 0.9592, far less favorable to Texas than is the 0.9946 computed using the raw computer scores. Big difference!
And averaging the three, as per BCS practice, I get 0.9987, very near a value of 1.0 that would represent absolute equality between the two teams, but very slightly favoring Oklahoma. This raises the question of whether such resolution, to several decimal places, has any meaning whatsoever in determining the better team. Since the resolution of the coaches poll is 1/1525, or .0006, this 0.0013 difference from unity represents two voters placing either team a single rank lower, or higher, on their ballot. For the Harris poll it represents about four voters doing so, voters who again, are free to use any decision-making process they like, including biased ones. So the bottom line is that converting computer scores to ranks introduced an error very favorable to Oklahoma, and if you remove that by using the raw scores, you get a razor’s edge advantage for them that is essentially meaningless in any practical sense.
This situation is not specific to what happened in 2008–it can happen at any time and there are likely other similar examples I could point to if I did a thorough analysis of every year’s BCS scores. It could well happen again this year with Auburn and Ohio State, or Alabama and Missouri, or other possibilities. The BCS NCG is a huge event; there is an enormous amount of money and prestige for the schools involved, and they also incur major benefits in national media exposure that almost certainly affect their recruiting success in years following the game, which will likely have a positive feedback effect on future success. So the math behind the determination of who gets selected to play in it is important.
And yet they didn’t even manage to think through some simple mathematical issues that can strongly affect the BCS rankings, and hence who plays for the national championship, and also who is and is not eligible for other post-season bowl games. Oh well, like I said, I’m not in charge, and next year these potential problems in determining the national champion are reduced by a new playoff system.