Open up

Well I don’t think for pleasure
It’s just hard not to do
My thinking is a measure of how much I need a clue
I’m still flying blind
Hoping I might find
A way to stop my thinking and open up my mind

My feelings hurt me plenty
Not feeling hurts me more
Feeling’s got me kneeling down, wounded to the core
Not feeling’s got its charms
But you’ll find out who it harms
When your lover soon discovers you can’t open up your arms

It’s like being in a prison
You lock yourself inside
A limited perspective, even with eyes open wide
But I can’t walk no more
Through scenes I’ve seen before
Why don’t you come to me, bring the key, and open up, open up the door

It’s just love and it’s a puzzle
Of that there is no doubt
Can’t do nothin’ with it and you sure can’t do without
Can’t learn your part
You won’t know where to start
‘Till you quit all your questions and open up, just open up, your heart

Chris Smither, Open Up

“The Lake, it is said, never gives up her dead…”

So, I’ve been learning a couple of Gordon Lightfoot songs lately, and reading various things, and thinking about my home town. And also realizing that Indian Summer will soon give way to something much less enjoyable. So this post is about all that.

A couple of days ago in the library I’m reading the January 2014 entries for the “Great Lakes Calendar” in the journal Inland Seas–a month that caused all kinds of mayhem on the Lakes, mostly involving ice and the breaking thereof. There, I see it noted that on Jan. 3 the “Wilfred Sykes loaded at Escanaba and was escorted by the tug Erika Kobasic“. The next day, up on Superior, the “Downbound Arthur M. Anderson stopped…to await daylight before attempting the Rock Cut…” while way down on Lake Erie “The Griffon was expected to…break out the ice-bound Cuyahoga, stopped at the end of the Sandusky Bay ship channel by heavily packed ice”. And that the next day, the Anderson was right behind the 1000 foot Mesabi Miner, when the latter rammed the ice breaker Hollyhock after an ice ridge slowed the breaker down, about 22 miles west of the Mackinac Bridge.

Inland Seas

I’m only marginally familiar with Great Lakes maritime history but I recognized two of those ship names immediately: they are tied to two major Great Lakes shipwrecks, and the very two that bracket all of the major maritime disasters no less. These ship names are the Griffon and the Arthur M. Anderson. The third one involved was the Wilfred Sykes.

The Griffon was the very first masted sailing ship on the Great Lakes, built by Robert LaSalle’s crew somewhere along the Niagara river, Canadian side, in 1679. It sailed across Lakes Erie, St Clair, Huron and Michigan to the vicinity of what is now Green Bay Wisconsin before sinking in far northern Lake Michigan, loaded with furs, on its return voyage. LaSalle was not killed however, as he had decided to head south overland to explore a connection between the Great Lakes and Mississippi drainages (and then overland from there all the way back to Montreal!). Like just about everything the French explorers and trappers did in the area at the time, it’s a thoroughly outrageous story.

At the recent end, there’s presumably no need to mention what the last major Great Lakes ship disaster was, due at least in part to Lightfoot’s famous and outstanding ballad.

The Arthur M. Anderson was very intimately involved in the entire episode. It was the Anderson that trailed 10 to 20 miles behind one Edmund Fitzgerald, kept in radio communication with it, helped it navigate after its radar went out, and first alerted the US Coast Guard of its disappearance from the radar. Most heroically, it performed the first SAR (search and rescue) operation for potential survivors, right during the height of the storm. The ship had in fact already reached the relative safety of Whitefish Bay, but upon request by the Coast Guard, it voluntarily returned out into the horrendous open water conditions to perform the search. Which speaks volumes about the ship’s captain.

The Wilfred Sykes is involved also. It loaded ore at the same dock at the same time as the Fitzgerald, and was also bound for the Soo locks. But its captain, having looked closely at the weather forecast of a major storm crossing the lake, had decided to track close to the Ontario shoreline instead of across open water. Therefore, it was just the Fitzgerald and the Anderson that crossed the lake together on the furious and fateful afternoon and evening of November 10, 1975. [That link goes to a very interesting paper that recreates the wind and wave conditions before and during the storm, using a weather model, the available surface observations, and a wind-wave model.]

Nearly 40 years later, the mystery of exactly what led to the sinking is still not fully resolved. It is known that the ship sank so fast in such ferocious conditions that there was no chance for survival. The incident was major news in the Great Lakes area at the time, even nationally, and nowhere moreso than in Toledo Ohio. About 5 or 6 (strictly from memory) of the crew of 29 lived in the area. This included the captain, Ernest McSorley, who lived about 7 or 8 miles from us, and was tragically on his very last voyage before retirement. [The most common run of the Fitzgerald was from Superior Wisconsin, to either Detroit or Toledo.] I can still vividly remember the front page story in the Toledo Blade the next day with the pictures of the missing crew. It seemed unbelievable that this could happen. The Great Lakes are littered with uncounted shipwrecks, but this was 1975.

Anyway, today I’m back in the same library, this time reading J.B. Mansfield’s History of the Great Lakes wherein I read:

“Another very interesting, and very sad, thing about this lake [Superior], says W.S. Harwood in St. Nicholas, is that it never gives up its dead. Whoever encounters terrible disaster— happily infrequent in the tourist season—and goes down in the angry, beautiful blue waters, never comes up again. From those earliest days when the daring French voyageurs in their trim birch-bark canoes skirted the picturesque shores of this noble but relentless lake, down to this present moment, those who have met their deaths in mid-Superior still lie at the stonepaved bottom. It may be said that, so very cold is the water, some of their bodies may have been preserved through the centuries. Sometimes, not far from the shore, the bodies of people who have been wrecked from fishing-smacks or from pleasure-boats overtaken by a cruel squall have been recovered, but only after the most heroic efforts with drag-net or by the diver.”

So, to get back to the title, this is the origin, or at least the earliest known explanation, of the sentence “The Lake it is said never gives up her dead, when the gales of November come early” in Gordon Lightfoot’s song. More on that whole issue is here.

“And all that remains are the faces and the names of the wives, and the sons and the daughters.”

“The number of medicine men in active service”

The medicine man was an institution of Piutedom…The distinction was not what might be termed a popular honor. Whether the selection was made for some hereditary reason, or because of some event at his birth or in the early life of the doctor, his status was established at an age when he had no chance to object. It does not appear that he was expected to employ his skill until he had reached reasonably mature years, but his status was settled, however he might resent it when he came to understand the part cast for him in the drama of life. And resent it he usually did, for as soon as his ministrations had sent a sufficient number–generally three–of his fellows to the happy hunting grounds his own violent and sudden removal from mundane affairs would come as a matter of custom.

Among the former Piute residents of Owens Valley, during the early years of white occupation, was one Jim, who had been selected by fate for a doctor’s career. In consequence, Jim constantly carried a “sixteen-shoot gun”, prepared at all times to “heap kill um” if there were attempts either to force him to practice or to fasten on him the results of some other person’s lack of skill in exorcising evil spirits…

The standard of medical success, if not skill, required of Piute medicos was higher than among civilized peoples; for while a white doctor is in no danger of violence whatever his (or his patient’s) luck, the Piute healer did well to arrange his affairs immediately on the demise of his third patient. He was marked for early and unceremonious removal, by whatever means might be convenient for the kin of his last case. Stones, arrows, lassos, in daylight or darkness, regardless of place or anything but opportunity, were used to reduce the number of medicine men in active service. It was approved tribal law.

Chalfant, W.A. (1922). The Story of Inyo

Ebola epidemic update

Today a new and more extensive WHO W. Africa ebola update was released, including data current as of Sept. 14, four days ago. I’ve therefore compiled new tables, and case and death rates. The new code and graphs are here, and the new data table is here. Liberia-specific graphs are here.

There’s been a slight drop in the transmission rate, based on these data. The daily rate is now estimated at about 1.038 (down from 1.043 a month ago). The 6-12 day rates, which correspond roughly with the estimate of R_zero, the per person rate of infection (depending on the mean infectiousness period, in days), range from 1.25 to 1.57. The midpoint value is 1.41. See here and here for my methodology.

It is almost certain that cases are going unreported however, and it could be many, I don’t know. These estimates are therefore underestimates of the true rate, and hence the severity of the outbreak. And this kind of thing is certainly tragic and not helping the situation.

This week’s puzzler

This week’s puzzler comes to us from John Storthwaite in Stonyfield, Minnesota, who has been wondering why there are so many trees blocking his view of the rocks up there.

Suppose you have been given the following problem. A number of objects are located in some given area, say trees in a forest for example, and one wishes to estimate their density D (number per unit area). Distance-based sampling involves estimating D by averaging a sample of squared, point-to-object distances (d), for objects of known integer rank distance (r) from the point. The distances are squared because one is converting from one dimensional measurements (distance) to a two dimensional variable (objects per unit area).

So here’s the puzzler. If you run a line through this arbitrary point, and choose the closest objects (r = 1) on each side of it, what will be the ratio of the squared distances of the two objects and how would you solve this, analytically? Would they be about the same distance away? If not, would there be a predictable relationship between them? The problem can be extended to any number of lines passing through said point, just with correspondingly more pairs of distances to evaluate.

The first questions one should ask here are clear: (1) “Why on earth would anybody want to do that?” and (2) “Is that the type of thing you clowns spend your time on?“. We have answers for those questions. Not necessarily satisfactory answers, but answers nonetheless. Giving an answer, that’s the important thing in life. So, if you know the answer, write it on the back of a $100 bill and send it to…

Anyway, there are two possible solutions here. The first one comes readily if one realizes that the densities within sectors must each be about the same as the overall density, since we assume a homogeneous overall density. But, for a given value of r, the squared distances in each of the two sectors must be, on average, about twice those for the collection of trees overall, because there are only half as many trees in each sector as there are overall. So, e.g. the r = 5th closest trees within each half are on average, 2X the squared distance of the r = 5th closest tree overall.

Knowing this, the relationship between the two r = 1 trees (label them r1.1 and r1.2 having squared distances d1.1 and d1.2) in the two sectors becomes clear. Since one of the two trees (r1.1) must necessarily be the r = 1 tree overall, and the mean squared distance of the two trees must be 2X that of the r = 1 tree, this translates to:

2*d1.1 = (d1.1 + d1.2)/2 and thus,
d1.2 = 3(d1.1),

i.e., one member of the pair will, on average, be exactly three times the squared distance of the other. This result can be confirmed by an entirely independent method involving asymptotic binomial/multinomial probability. That exercise is left, as they say in the ultimate cop-out, to the reader.

This work has highly important implications with respect to a cancer research, and for solutions to poverty, malnutrition, and climate change. It can also help one discern if tree samplers 150-200 years ago were often sampling the closest trees or not.

Funding for this work was provided by the Doris Duke Foundation, the Society for American Baseball Research, the American Bean and Tree Counters Society, the Society for Measuring Things Across From Other Things, and the Philosophy Department at the University of Hullaballo. All rights reserved, all obligations denied. Any re-use, re-broadcast, retransmission, regurgitation or other use of the accounts and descriptions herein, without the express written consent of the closest random stranger on the street, or the closest random stranger on the other side of said street, is strictly prohibited.

Golf course succession

A friend’s property, in the county my parents live in, is surrounded by a nine hole golf course that went out of business several years ago, and is about to be acquired by the US Fish and Wildlife Service. It is undergoing rapid ecological succession to a less managed state since they stopped mowing a few years back. This process is very common with abandoned farm land, but this is the first I’ve looked at a golf course. The place is interesting because the area is naturally wet, being originally part of a very large swamp/wetland complex (the “Great Black Swamp”) that stretched over many counties and caused this area to be the last settled in Ohio. The original vegetation, documented in 1820, was dominated by intermixed treeless wet prairie, and swamp or other northern wetland hardwoods, with standing water over the entire year common. The inherently wet soils might well have affected the course’s success, I don’t know.

Several tree species mentioned in the 1820 GLO land survey notes (see bottom image) are still present, including swamp white oak (Quercus bicolor), american elm (Ulmus americana), pin oak (Q. palustris), green ash (Fraxinus pennsylvanica), hickory (Carya cordiformis), eastern cottonwood (Populus deltoides), and unspecified willows (Salix spp.). Others have clearly come in post-settlement, including black walnut (Juglans nigra), northern catalpa (Catalpa speciosa), weeping willow (Salix babylonica), possibly silver maple (Acer saccharinum), and the completely misplaced jack pine (Pinus banksiana) and eastern redcedar (Juniperus virginiana) (most likely both as yard markers and fairway dividers). How the USFWS will manage the property will be interesting; it may be difficult to recreate the wet prairie habitat given that the natural drainage pattern is now highly altered by ditching and drain tiling.

Wet prairie and hardwood swamp, to farm, to golf course, to...

Wet prairie and hardwood swamp, to farm, to golf course, to…

Goldenrod (Solidago spp), a notorious and obvious late bloomer.

Goldenrod (Solidago spp), a notorious and obvious late bloomer.

Continue reading

Camping among the tombs

I found a road which led me to the Bonaventure graveyard. If that burying-ground across the Sea of Galilee, mentioned in Scripture, was half as beautiful as Bonaventure, I do not wonder that a man should dwell among the tombs. It is only three or four miles from Savannah…Part of the grounds was cultivated and planted with live-oak, about a hundred years ago, by a wealthy gentleman who had his country residence here. But much the greater part is undisturbed. Even those spots which are disordered by art, Nature is ever at work to reclaim, and to make them look as if the foot of man had never known them. Only a small plot of ground is occupied with graves and the old mansion is in ruins.

Bonaventure Cemetery

Continue reading

“I ventured out…”

A wild scene, but not a safe one, is made by the moon as it appears through the edge of the Yosemite Fall when one is behind it. Once…I ventured out on the narrow ledge that extends back of the fall…and wishing to look at the moon through some of the denser portions of the fall, I ventured to creep further behind it. The effect was enchanting: fine, savage music sounding above, beneath, around me, while the moon, apparently in the very midst of the rushing waters, seemed to be struggling to keep her place, on account of the ever-varying form and density of the water masses through which she was seen…I was in fairy land between the dark wall and the wild throng of illumined waters, but suffered sudden disenchantment; for like the witch scene in Alloway Kirk, “in an instant all was dark”. Down came a dash of spent comets, thin and harmless looking in the distance, but they felt desperately solid and stony when they struck my shoulders, like a mixture of choking spray and gravel and big hailstones. Instinctively dropping on my knees, I gripped an angle of the rock, curled up like a fern frond with my face pressed against my breast, and in this way submitted as best I could to my thundering bath…How fast one’s thoughts burn in such times of stress. I was weighing chances of escape. Would the column be swayed a few inches from the wall, or would it come yet closer? The fall was in flood and not so lightly would its ponderous mass be swayed. My fate seemed to depend on the fate of the “idle wind”…

John Muir, The Yosemite, p.30

The Yosemite
Yos Falls 1900

Who’s “best” and how do you know it?

So suppose you have your basic Major League Baseball (MLB) structure, consisting of two leagues having three divisions of five teams each, each of which plays a 162 game, strongly unbalanced*, schedule. There are, of course, inherent quality differences in those teams; some are better than others, when assessed over some very large number of games, i.e. “asymptotically” **. The question thus arises in your mind as you ponder why the batter feels the need to step out of the batter’s box after each pitch ***: “how often will the truly best team(s) win their league championships and thus play each other in the World Series”. The current playoff structure involves having the two wild card teams play each other in a one game elimination, which gives four remaining playoff teams in each league. Two pairings are made and whoever wins three games advances to the league championship series, which in turn requires winning four games.

I simulated 1000 seasons of 162 games with leagues having this structure. Inherent team quality was set by a normal distribution with a mean of 81 wins and a standard deviation of ~7, such that the very best teams would occasionally win about 2/3 (108) of their games, and the worst would lose about that same fraction. Win percentages like those are pretty realistic, and the best record in each league frequently falls between 95 and 100 wins.

1) The truly best team in each league makes the playoffs about 80 percent of the time under the current system, less when only four teams make it.
2) That team wins its league championship roughly 20 to 30 percent of the time, getting knocked out in the playoffs over half the time. It wins the whole shebang about 10 to 15 percent of the time.
3) Whenever MLB expands to 32 teams, in which the playoff structure will very likely consist of the four division winners in each league and no wild card teams, the truly best (and second and third best) teams in each league will both make the playoffs, and advance to the World Series, less frequently than they do now.

This type of analysis is generalizable to other types of competitions under structured systems, at least for those in which the losers of individual contests live to fight another day, or if they don’t, are replaced by others of the same basic quality. The inherent spread in team quality makes a very big difference in the results obtained however. It’ll apply very well to baseball and hockey, but not so well to the NBA, for example.

So the next time an MLB team wins it’s league, or the World Series, and you’re tempted to think this means they must be the best team in the league (or MLB overall), think about that again. Same for the NHL.

* Currently, each team plays around 3 times as many games against each intra-division opponent as inter-division opponents, not even including the 20 inter-league games (which I’ve ignored in these analyses, assuming all games are within-league).
** These records are conceived of as being amassed against some hypothetical, perfectly average team. This team is from Lake Wobegon Minnesota.
*** It is perfectly OK to think other things of course, and we need not worry about the particulars of the language embodied therein.

What’s complex and what’s simple in an exponential model?

In the post on estimating the rate of spread in the current ebola epidemic, a commenter stated that using a monthly rate of disease spread in Liberia was a “simpler” model than what I had done, which was based on a daily rate. This is not correct and I want to clarify why here.

In fact I used a very simple model–an exponential model, which has the form y = b^ax. You can’t get any simpler than a one parameter model, and that fact doesn’t change just because you alter the value of the base b. Any base can model an exponential increase; changing it just requires a corresponding change in parameter a, for a given pair of y and x variables. Base choice ought to be done in a way that carries some meaning. For example, if you’re inherently interested in the doubling time of something, then 2 is the logical choice*. But when no particular base value is obvious, it’s still best if the value used carries meaning in terms of the values of x, i.e. where a = 1.0, presuming that x is measured on some scale that has inherent interest. In my case, that’s the per-day increase in ebola cases.

However, if you fit an exponential model to some data, most programs will use a base of e (~2.781) or 10 by default; the base is fixed and the rate of change is then determined with respect to the units of ax. That’s a bit backwards frankly, but not a big deal, because the base used can easily be converted to whatever base is more meaningful relative to the data at hand. Say for example, that your model fitting procedure gives y = e^(3.2x), where b = e and a = 3.2. But if your x variable is recorded in say, days, you may well not be interested in how y changes every 3.2 days: you want to know the per-day rate of change. Well, y = e^(ax) is simply y = (e^a)^x, and so in this case b = e^(3.2) = 24.5; it takes a larger base to return a given y value if the exponent is smaller. It’s just a straight mathematical transformation (e^a), where a is whatever value is returned in the exponential model fitting. It has nothing to do with model complexity. It has instead to do with scaling, ease of interpretation and convenience.

The relevance to the ebola transmission rate modeling and the original comment is that those rates could very well change within a month’s time due to radical changes in the population’s behavior (critical), or perhaps drug availability (unlikely in this case). In a disease epidemic what happens from day to day is critical. So you want to use a time scale that allows you to detect system changes quickly, while (in this case) also acknowledging the noise generated by the data reporting process (which complicates things and was the whole point of using loess to smooth the raw data before making the estimates). Note that I’ve not gone into the issue of how to detect when an exponential growth rate has changed to some other type of growth. That’s much more difficult.

*Exponential functions are also useful for analyzing outcomes of trials with categorical variables, a where a = 1 and b defines the number of possible outcomes of some repeated process. For example y = 2^25 gives the total number of possible permutations of 25 trials of an event having two possible outcomes. But that’s a different application than modeling a change rate (unless you want to consider the increase in the number of possible permutations a rate).

It just won’t get you there

I’ve got two little feet to get me across the mountain
Two little feet to carry me away into the woods
Two little feet
A big mountain
And a cloud coming down, a cloud a comin’ down

I hear the voices of the ancient ones
Chanting magic words from a different time
Well there is no time, there is only this rain
There is no time
That’s why I missed my plane

John Muir walked away into the mountains
With his old overcoat and a crust of bread in his pocket
We have no knowledge and so we have stuff
But stuff with no knowledge is never enough to get you there…
It just won’t get you there

Greg Brown, Two Little Feet

Estimating the spread rate in the current ebola epidemic

I’ve now written several articles on the West African ebola outbreak (see e.g. here, here, here, and here). This time I want to get more analytical, by describing how I estimated the ebola basic reproduction rate Ro (“R zero”), i.e. the rate of infection spread. Almost certainly various people are making these estimates, but I’ve not seen any yet, including at the WHO and CDC websites or the few articles that have come out to date.

Some background first. Ro is a fundamental parameter in epidemiology, conceptually similar to r, the “intrinsic rate of increase”, in population biology (I’ll refer to it as just R here). It’s defined as the mean number of secondary disease cases arising from a primary case. When an individual is infected, he or she is a secondary case relative to whoever infected him or her, and in turn becomes a primary case capable of spreading the disease to others. Estimates of R depend strongly on the both the biology of the virus, and the behavior of the infected. It is thus more context dependent than population biology’s r parameter, which assumes idealized conditions and depends more strictly on biologically limiting parameters (lifespan, age to first reproduction, gestation time etc.). Diseases which are highly contagious, like measles, smallpox and the flu, have relatively high R values, whereas those requiring direct contact or exchange of body fluids, like HIV, have rates which are at least potentially much lower, depending on the behavior of the infected.

To stop an epidemic of any disease, it is necessary to first lower R, and eventually bring it near zero. Any value of R > 0.0 indicates a disease with at least some activity in the population of concern. When R = 1.0, there is a steady increase in the total number of cases, but no change in the rate of infection (new cases per unit time): each infected person infects (on average) exactly 1.0 other person. Any R > 1.0 indicates a (necessarily exponential) increase in the infection rate, that is, the rate of new cases per unit time (not just the total number of cases), is increasing. It’s also possible to get a constant, rather than accelerating, increase in the number of new cases, but that’s an unstable equilibrium requiring a steady decrease of R from values > 1.0, and is thus uncommon.

Continue reading

Liberian ebola rate jumps

Updated as of 09-18-2014 WHO report.

Many reports from on-the-ground workers with the WHO, Doctors Without Borders, state health and aid agencies, etc. have commented that the case and death rates in at least some locations have almost certainly been too low, because of a substantial number of people avoiding going to clinics and hospitals, out of fear primarily. This situation seems to be the worst in Liberia. See this article for example. Today’s WHO-released data from Liberia may be confirmation of this, many new cases and deaths being reported there from August 16-18. Such an explanation could be due to more intensive case tracking/finding. However, it is also possible that the epidemic is simply exploding there now, especially given that it is well established in the capitol of Monrovia. Or it could be due to some combination of the two.

In the graphs below I used a pretty stiff “span” parameter (span = 1.0) in the loess smoothings (dark black lines) of the WHO-reported raw data (thin line). This choice gives about 35 deaths/day in Liberia. If I use something more flexible, span = 0.5 for example, the estimated rates are higher, about 47/day. However, it’s best to go stiff (i.e. conservative) here, because clearly there are major variations due to data gathering and reporting timelines that have been causing large fluctuations in the numbers (discussed more here).  But there’s also clearly more than just that going on with this latest surge in numbers.

This situation is now extremely serious, if it wasn’t already. Note also that negative rates early on in the outbreak are presumably due to case retractions or re-classifications. Code generating data and graphs is here and data table itself is here.

Continue reading