Gauley time!

Black cherry trees in ripe fruit and goldenrod in full bloom and that can only mean one thing, and no I’m not talking about football season.

They opened the Summerfield Dam gates at 7AM this morning; for godsake get down, or up, or over there in the next six weeks and try to kill yourself with all the others if you can. There will be a party, a rather large and extended one and it’s anybody’s guess at to whether river flow will exceed that of beer. Now, when on the river, try to remember, apriori if possible, that plastic (or rubber) side down is optimal, that rocks are typically fairly hard and to take a big gulp of air before you go under. Remembering these aposteriori is fairly automatic. Everything else is open to personal interpretation.

Best to put in downstream from the nozzle a bit, although I’m sure it’s been tried:
Gauley opentunnel

What made America great:
Gauley_pink dory at pillow rapid

This is probably sub-optimal form:

Definite sub-optimal form:
Gauley_poor form

This can be made to work for a while, like 12 seconds:
Gauley mattress

Really excellent form:
Gauley Boof


When our songs were just like prayer

I don’t spend a lot of time listening to new music or artists, but I do spend some, both for the enjoyment and to get cover material or inspiration for new songs. Every now and then one runs across someone or something truly special that you weren’t aware of; the following is a prime example. I highly recommend this guy.

Remember when our songs were just like prayer
Like gospel hymns that you called in the air
Come down, come down sweet reverence
Unto my simple house and ring… and ring

Ring like silver, ring like gold
Ring out those ghosts on the Ohio
Ring like clear day wedding bells
Were we the belly of the beast, or the sword that fell?
We’ll never tell

Come to me, clear and cold
On some sea
Watch the world spinning waves
Like that machine

Now I’ve been crazy, couldn’t you tell?
I threw stones at the stars, but the whole sky fell
Now I’m covered up in straw, belly up on the table
Well I drank and sang, and I passed in the stable

That tall grass grows high and brown
Well I dragged you straight in the muddy ground
And you sent me back to where I roam
Well I cursed and I cried, but now I know
Now I know

And I ran back to that hollow again
The moon was just a sliver back then
And I ached for my heart like some tin man
When it came, oh it beat and it boiled and it rang
Oh, it’s ringing

Ring like crazy, ring like hell
Turn me back into that wild haired gale
Ring like silver, ring like gold
Turn these diamonds straight back into coal
Turn these diamonds straight back into coal

The Stable Song, Gregory Alan Isakov

Way over yonder in the minor key

I lived in a place called Okfuskee
And I had a little girl in a holler tree
I said, little girl, it’s plain to see,
There ain’t nobody that can sing like me

She said it’s hard for me to see
How one little boy got so ugly
Yes, my luttle girly, that might be
But there ain’t nobody that can sing like me

Ain’t nobody that can sing like me
Way over yonder in the minor key
Way over yonder in the minor key
There ain’t nobody that can sing like me

We walked down by the Buckeye Creek
To see the frog eat the goggle eye bee
To hear that west wind whistle to the east
There ain’t nobody that can sing like me

Oh my little girly will you let me see
Way over yonder where the wind blows free
Nobody can see in our holler tree
And there ain’t nobody that can sing like me

Her mama cut a switch from a cherry tree
And laid it on the she and me
It stung lots worse than a hive of bees
But there ain’t nobody that can sing like me

Now I have walked a long long ways
And I still look back to my tanglewood days
I’ve led lots of girls since then to stray
Saying, ain’t nobody that can sing like me.

Woody Guthrie, Billy Bragg

Railroad Earth
Dave Carter and Tracey Grammer
Justin here will show you how to play it if interested, but it’s all G,C,D and Em.

Rate process estimation: Poisson vs gamma

This post is about estimating a rate process, particularly when certain required input data are missing or critical assumptions are unmet. Although it’s quite unrelated to recent posts (here and here) I’m actually headed somewhere definite with the collection of them, if you can bear with me. If you can’t bear with me, I can bear that.

As we know, estimating rate processes is a very common task in science, where “rate” is defined generally as a measured change in one variable per unit measure of another, usually expressed as a ratio or fraction. In the discussion that follows I’ll use object density–the number of objects of interest per unit area–to illustrate the concepts, but they are applicable to rates of any type.

Obtaining rate estimates, with rare exceptions, requires empirical sampling over the domain of interest, followed by some mathematical operation on the samples, often an averaging. Sampling, however, can take two opposing approaches, as determined by the measurement units of the denominator and the numerator. Using our density example, in the first approach we tally the objects occurring within samples of some apriori defined area (the denominator thus fixed), and then simply average the values over all samples. In the second distance sampling (DS) approach, we instead measure to the nth closest objects from random starting points, with the value of n chosen apriori, and thus with the numerator fixed and the denominator varying. Low values of n are typically chosen for convenience, and distances are converted to density using a suitable density estimator.

This DS approach has definite advantages. For one, it is time efficient, as no plot delineations are required, and decisions on which objects to tally or which have already been so, are typically easier. A second advantage is that the total sample size is independent of object density, and a third is an often superior scatter of samples through the study area, leading to a better characterization of spatial pattern.

The data from the two approaches are modeled by two well-established statistical models, the Poisson and the gamma, respectively. With the Poisson, the number of objects falling in each of a collection of plots will follow a Poisson distribution, the exact values determined by the overall density. With the second approach, the distances to the nth closest objects follow a gamma distribution.

Either approach requires that objects be randomly located throughout the study area for unbiased estimates, but there is a second potential bias source with distance sampling (DS), and hence a down-side. This bias has magnitude of (n+1)/n, and so e.g., measuring to the n = 1st closest object will bias the density estimate by a factor of 2x, to the n = 2nd closest by 3/2, etc. This bias is easily removed simply by multiplying by the inverse, n/(n+1), or equivalently, measuring to the 2nd closest objects will give the area corresponding to unit density, that is, the area at which object density equals one. Which is kind of interesting, assuming you’ve made it this far and nothing more enthralling occupies your mind at the moment.

The shape of the gamma distribution varies with n. When n = 1, the gamma assumes a negative exponential shape, and when n > 1 it is unimodal, strongly skewed at low values of n, but of decreasing skew (more normal) at higher n. The upshot is that one can diagnose n if it is unknown, because these distributions will all differ, at least somewhat, especially in their higher moments, such as the variance and skew. However, discriminatory power decreases with increasing n, and the approach also assumes the objects occur spatially randomly, which they of course may in fact not.

If our rate denominator measurement space is defined on more than one dimension–as it is in our example (area, two dimensions)–we can subdivide it into equally sized sectors and measure distances to the nth closest object within each sector. Sectors here would equate to angles emanating from the sample point, and this “angle-order” method increases the sample size at each sample point. Supposing four sectors, for a given value of n, the four measurements at a point give (four choose two) = six possible ratios between the objects, these ratios being defined as the further distance over the closer. The distributions of these six values, over a collection of sample points, are then collectively discriminatory for n.

Usually both the rate and its homogeneity are unknown, and we need the latter to get the former. If a non-random pattern exists, the non-randomness can be quantified (in several ways) if we know n, and density estimates made using suitable corrections. If we don’t know n but do know that the objects are in fact randomly arranged, we can still infer density, although not with nearly as much precision as when we do know n. Interestingly, the bias arising from a clustered non-random pattern is of the same direction as that from an under-estimate of n, both leading to density under-estimates. Similarly, a tendency toward spatial pattern regularity gives a bias in the same direction as an overestimate of n.

These last facts can be useful, because our main interest is not n or the spatial pattern, but rather the rate (density), and the effects of uncertainties in them are not additive, and can even be compensating, depending on the situation. This serves to greatly constrain the total uncertainty in the rate estimate, even when these critical pieces of input information are lacking. I’m going to try to expound on that in the next post in this utterly enthralling series.

Support for this post has been provided by the Society for Public Education of Not Very Well Recognized or Cared About Issues. Nothing in the post should be construed, or mis-construed, as in any way necessarily reflecting the views, opinions, sentiments, thoughts, conceptual leanings, quasi-conscious daydreaming or water cooler banter of said Society, or really of anyone in particular other than me, and even that is open to debate. You may now return to your regularly scheduled life.

Revisited, revisited

Fifty years ago later this week. August 30, 1965 to be exact.

Hwy61cover One of the more interesting, unexpected and outlandish albums in modern American music. And the first of many changes in musical direction for its author, who wasn’t afraid to upset the apple cart of expectations and go in some new direction. As he’s quoted to have said “I try my hardest to be just like I am“.

Oh God said to Abraham, “Kill me a son”
Abe says, “Man, you must be puttin’ me on”
God say, “No.” Abe say, “What?”
God say, “You can do what you want Abe, but
The next time you see me comin’ you better run”
Well Abe says, “Where do you want this killin’ done?”
God says, “Out on Highway 61”

Outcome probabilities

Continuing from the previous post., where I discussed earth’s recent surface temperature increase hiatus/slowdown/backoff/vacation

Well not really–I discussed instead the very closely related topic of enumerating the outcomes of a given probabilistic process. And actually not so much a discussion as a monologue. But couldn’t somebody please address that other issue, it’s just been badly neglected… :)

Anyway, enumerating the possible allocations of n objects into q groups is rarely an end in itself; the probability, p of each is usually what we want. This is a multinomial probability (MP) problem when q > 2, and binomial (BP) when q = 2, in which we know apriori the per-trial p values and want to determine probabilities of the various possible outcomes over some number, n, of such trials. In the given example, the per-trial probabilities of group membership are all equal (1/6) and we want to know the probability of each possible result from n = 15 trials.

One has to be careful in defining exactly what “trials” and “sample sizes” constitute in these models though, because the number of trials can be nested. We could for example, conduct n2 = 100 higher level trials, in each of which the results from n1 = 2 lower level trials are combined. This is best exemplified by Hardy-Weinberg analysis in population genetics; a lower level trial consists of randomly choosing n1 = 2 alleles from the population and combining them into a genotype. This is repeated n2 times and the expected genotype frequencies, under certain assumption of population behavior, are then compared to the observed, to test whether the assumptions are likely met or not. If only two possible outcomes of a single trial (alleles in this case) exist in the population, the model is binomial, and if more than two, multinomial.

There are two types of MP/BP models, corresponding to whether or not group identity matters. When it does, the BP/MP coefficients determine the expected probabilities of each specific outcome. For n objects, q groups and group sizes a through f, these equal the number of permutations, as given by n! / (a!b!c!d!e!f!), where “!” is the factorial operator and 0! = 1 by definition. This formula is extremely useful; without it we’d have to enumerate all permutations of a given BP/MP process. And this would choke us quickly, as such values become astronomical in a hurry: with n = 15 and q = 6, we already have 6^15 = 470 billion possible permutations.

When group identity doesn’t matter, only the numerical distribution matters, and this will decrease the total number of coefficients but increase the value of each of them. For example, in locating the n = 50 closest trees to a random sampling point, one may want to know only the expected numerical distribution across the q angle sectors around the point. In that case, the allocation [2,1,1,0] into groups [a,b,c,d] would be identical to [0,1,1,2] and to 10 others, and these thus have to be aggregated. The number of aggregations is given by the number of permutations of the observed group sizes, which in turn depends on their variation. When all differ, e.g. [0,1,2,3,4,5] for n = 15, the number of equivalent outcomes is maximized, equaling q + (q-1) + (q-2)…+ 1 (in this case, 21) [Edit: oops, confused that with what follows; the number of permutations there is given by q!]. When some but not all of the group sizes are identical it’s more complex, as determined by products of factorials and permutations. When the number of identical group sizes is maximized, the equivalent outcomes are minimized, always at either q-1 or 1. In this case there are 6 variations of [n,0,0,0,0,0].

To get the desired probability density function, the raw MP/BP coefficients, obtained by either method, are standardized to sum to 1.0.

Next time I’m going to discuss a general solution to the problem of estimating the otherwise unknown relationships between pairs of objects involved in a rate process, such as density, the number of objects per unit area or time. These can be deduced analytically using multinomial and gamma probability models in conjunction.

It will make you call your neighbors over for a party and group discussion. If it doesn’t you get a full refund.

Count ’em

Suppose you have n objects and want to enumerate all possible unique allocations thereof among q groups, where “unique” is defined by numerical allocation only, independent of actual group membership for each object. Each group can take any size from 0 to n, contingent on the total in the other q-1 groups. I’m working on a problem where I have to do this; there might well be an available method for doing this, but one can waste a lot of time looking for such things in my experience, you typically learn a lot more when you have to solve a problem yourself. Anyway I couldn’t find one readily so I wrote this R function as a script. Feel free to use if it’s, well, useful.

all.breaks = function(n, groups){
 t1 = cbind(n - 0:n, n - n:0)
 t1 = unique(t(apply(t1, 1, sort)))
 test = groups - 2
  if (test>0){
   for (i in 1:test){
    t2a = sapply(t1[,split], function(x) 0:x)
    t2b = sapply(t1[,split], function(x) x:0)
    reps = unlist(lapply(t2a, length))
    t2a = unlist(t2a); t2b = unlist(t2b)
    temp = t(apply(cbind(t2a,t2b), 1, sort))
    t1 = apply(t1, 2, function(x) rep(x,reps))
    t1 = cbind(t1[,-split], temp)
    t1 = unique(t(apply(t1, 1, sort)))
   } # End for() 
  } # End if()
  t1 = t(apply(t1,1,sort, decreasing=T))
  colnames(t1) = letters[1:groups]; rownames(t1) = 1:nrow(t1)
 } # End function

Example with 15 objects and 6 groups:

> all.breaks(n=15, groups=6)

     a b c d e f
1   15 0 0 0 0 0
2   14 1 0 0 0 0
3   13 2 0 0 0 0
4   12 3 0 0 0 0
5   11 4 0 0 0 0
6   10 5 0 0 0 0
7    9 6 0 0 0 0
8    8 7 0 0 0 0
9   13 1 1 0 0 0
10  12 2 1 0 0 0
11  11 3 1 0 0 0
12  10 4 1 0 0 0
13   9 5 1 0 0 0
14   8 6 1 0 0 0
15   7 7 1 0 0 0
16  11 2 2 0 0 0
17  10 3 2 0 0 0
18   9 4 2 0 0 0
19   8 5 2 0 0 0
20   7 6 2 0 0 0
21   9 3 3 0 0 0
22   8 4 3 0 0 0
23   7 5 3 0 0 0
24   6 6 3 0 0 0
25   7 4 4 0 0 0
26   6 5 4 0 0 0
27   5 5 5 0 0 0
28  12 1 1 1 0 0
29  11 2 1 1 0 0
30  10 3 1 1 0 0
31   9 4 1 1 0 0
32   8 5 1 1 0 0
33   7 6 1 1 0 0
34  10 2 2 1 0 0
35   9 3 2 1 0 0
36   8 4 2 1 0 0
37   7 5 2 1 0 0
38   6 6 2 1 0 0
39   8 3 3 1 0 0
40   7 4 3 1 0 0
41   6 5 3 1 0 0
42   6 4 4 1 0 0
43   5 5 4 1 0 0
44   9 2 2 2 0 0
45   8 3 2 2 0 0
46   7 4 2 2 0 0
47   6 5 2 2 0 0
48   7 3 3 2 0 0
49   6 4 3 2 0 0
50   5 5 3 2 0 0
51   5 4 4 2 0 0
52   6 3 3 3 0 0
53   5 4 3 3 0 0
54   4 4 4 3 0 0
55  11 1 1 1 1 0
56  10 2 1 1 1 0
57   9 3 1 1 1 0
58   8 4 1 1 1 0
59   7 5 1 1 1 0
60   6 6 1 1 1 0
61   9 2 2 1 1 0
62   8 3 2 1 1 0
63   7 4 2 1 1 0
64   6 5 2 1 1 0
65   7 3 3 1 1 0
66   6 4 3 1 1 0
67   5 5 3 1 1 0
68   5 4 4 1 1 0
69   8 2 2 2 1 0
70   7 3 2 2 1 0
71   6 4 2 2 1 0
72   5 5 2 2 1 0
73   6 3 3 2 1 0
74   5 4 3 2 1 0
75   4 4 4 2 1 0
76   5 3 3 3 1 0
77   4 4 3 3 1 0
78   7 2 2 2 2 0
79   6 3 2 2 2 0
80   5 4 2 2 2 0
81   5 3 3 2 2 0
82   4 4 3 2 2 0
83   4 3 3 3 2 0
84   3 3 3 3 3 0
85  10 1 1 1 1 1
86   9 2 1 1 1 1
87   8 3 1 1 1 1
88   7 4 1 1 1 1
89   6 5 1 1 1 1
90   8 2 2 1 1 1
91   7 3 2 1 1 1
92   6 4 2 1 1 1
93   5 5 2 1 1 1
94   6 3 3 1 1 1
95   5 4 3 1 1 1
96   4 4 4 1 1 1
97   7 2 2 2 1 1
98   6 3 2 2 1 1
99   5 4 2 2 1 1
100  5 3 3 2 1 1
101  4 4 3 2 1 1
102  4 3 3 3 1 1
103  6 2 2 2 2 1
104  5 3 2 2 2 1
105  4 4 2 2 2 1
106  4 3 3 2 2 1
107  3 3 3 3 2 1
108  5 2 2 2 2 2
109  4 3 2 2 2 2
110  3 3 3 2 2 2

The next post will examine the probabilities of each such outcome.

Meandering Numps

If you need something conducive to positive mental health, then you want to check out Meandering Numps ASAP. I’m jealous that I didn’t think of it first. Hats off to Andy–stuff like this makes life livable.


“The Wally was a black beast, and although he had a ferocious demeanor, he was in fact a daft ape, and a massive dope.”

Just one of the cast of characters over at Numps’ place.

Local history, Underground Railroad edition

There was a local history event in the old home town today. Decided to go check it out, as I’m a sucker for these type things and am rarely disappointed, as was the case again. Learning, for example, that James Ashley, who introduced the 13th Amendment to Congress that legally abolished slavery, was a local guy. And of the myriad ways there are of hiding and transporting people, and distracting the runaway slave hounds, when you put your mind to it. And more.

IMG_1986Right in our back yard. We had no idea growing up. But we could name all 50 states…

IMG_1989Mr Douglass was there–and he was impressive.

Continue reading

When you sing you pray twice

John Gorka has long been one of my favorite song writers–his combination of lyrics, guitar style, humor, and intelligence are equaled by very few (like maybe 2-3 others) in my experience. Had a much tougher go learning this one than expected though–John’s chord progressions can be tricky–but eventually got it. Thanks be to Chordify!

When you sing
You make the world a better place
When you sing
You bring us to a state of grace
When you sing, when you sing
Do you know the joy you bring?
When you sing

Like when you sing in your car
All alone with the radio
Or along with the Sunday songs
Breathe in, breathe out, move on
The song begins in your heart
Respond with your voice
Freedom’s right where you are
You still have a choice

St Francis from Italy, he said long ago
When you sing, you pray twice and
That was, and always will be, so
Don’t worry, you can’t make a sound
To please a stranger’s ear
Do it for others, and you’ll
Sing sweet, loud, and clear
Sweet, loud, and clear

You can sing with your hands
You can sing with your words
Sing with that still small voice
It’ll still be heard
You can sing with your thoughts
You can sing with your gifts
Sing to the weather, ’til the weather lifts

John Gorka, When You Sing

I was free to wonder

I couldn’t bribe a wino on what I used to make
My fortune was as sure as the wind
But I was free to wonder and time was on my hands
It was mine to burn and to bend

Freedom for freedom, call that an even scheme
Give me time to wonder and to dream
I need the money; they’ll take the time
Down to the land of the bottom line

Then there came a chance to make some steady dough
Bouncing up my alley to the door
You fill your clothes with keys, and damned responsibilities
You trade in the maybe for the sure

Freedom for freedom, call that an even scheme
Give me time to wonder and to dream
I need the money; they’ll take the time
Down to the land of the bottom line

Land of the Bottom Line, John Gorka

Shakespeare from the natives

An interesting incident in connection with the Bradstreet expedition was a journey undertaken by Captain Morris…Under instructions from his superior he “set out in good spirits from Cedar Point (mouth of the Maumee), Lake Erie on the 26th of August, 1764…and at the same time the army proceeded for Detroit”. He was accompanied by two Canadians and a dozen Indians…The party proceeded up the Maumee to the headquarters of Pontiac, whose “army consisting of 600 savages with tomahawks in their hands”, surrounded him….At Fort Wayne another rabble of Indians met the embassy in a threatening manner, but Morris remained in a canoe reading “The Tragedy of Anthony and Cleopatra” in a volume of Shakespeare which had been presented to him by an Indian Chief. This was undoubtedly one of the strangest circumstances under which the works of Shakespeare were ever perused.

Winter, N.O. 1917. A History of Northwest Ohio, vol. 1

Without saying a word

It’s amazing how you can speak right to my heart
Without saying a word, you can light up the dark
Try as I may I could never explain
What I hear when you don’t say a thing

The smile on your face lets me know that you need me
There’s a truth in you eye sayin’ you’ll never leave me
The touch of your hand says you’ll catch me whenever I fall
You say it best…when you say nothing at all

Allison Krauss

Karl et al., once again

I’m just going to pick up here from the update posted at the top of the previous post. Read the previous two posts if you need the background on what I’m doing.

I was fortunately able to get the global mean annual values for NOAA’s ERSST version 4 data from Gavin [edit: this must actually be merged land and ocean, or MLOST, not ERSST data]. Here’s the same analysis I did previously, using those data. I also realized that the data at the NOAA web page included year-to-date value for 2015 (which so far is apparently record warm). I removed that year here. As before, no significance testing here, which is trickier than might appear, or that I have time for. (Note that Karl et al. did not test for significance either).

First, here’s a graph of the two time series (previous ERSST version = black, new data = blue). The graph appears to be identical, or nearly so, to Karl et al.’s Fig 1a:

This gets interesting here. Visually, there’s clearly little difference between the two series; their correlation is about 0.995. But…when I run the same analysis on the new data that I did on the previous version, the results (right-most two columns) are very different indeed:

   Start1 End1 Start2 End2  Slope1 Slope2  Ratio Previous
2    1911 1997   1998 2012 0.06933 0.0855 1.2332   0.4995
3    1911 1997   1998 2014 0.06933 0.1063 1.5325   0.8170
4    1911 1999   2000 2012 0.07252 0.0920 1.2685   0.5422
5    1911 1999   2000 2014 0.07252 0.1162 1.6018   0.9033
6    1931 1997   1998 2012 0.06517 0.0855 1.3120   0.5821
7    1931 1997   1998 2014 0.06517 0.1063 1.6305   0.9522
8    1931 1999   2000 2012 0.07071 0.0920 1.3011   0.5999
9    1931 1999   2000 2014 0.07071 0.1162 1.6430   0.9995
10   1951 1997   1998 2012 0.10488 0.0855 0.8152   0.3647
11   1951 1997   1998 2014 0.10488 0.1063 1.0131   0.5966
12   1951 1999   2000 2012 0.11211 0.0920 0.8206   0.3797
13   1951 1999   2000 2014 0.11211 0.1162 1.0363   0.6327
14   1971 1997   1998 2012 0.17069 0.0855 0.5009   0.2162
15   1971 1997   1998 2014 0.17069 0.1063 0.6225   0.3536
16   1971 1999   2000 2012 0.17887 0.0920 0.5143   0.2274
17   1971 1999   2000 2014 0.17887 0.1162 0.6495   0.3788

All of the ratios are now higher; that is, there is less of a difference in slope between the post- and pre-breakpoint time periods. This is regardless of start, break, or end years. Most of the ratios are now > 1.0; before, none of the them were so. For the 1951 start date emphasized the most by Karl et al., and ending in 2014, there is near equality in slopes, just as they state. If one ends instead at 2012, the recent interval is just over 80% of the earlier, much higher than using the previous version data, where it was 36-38%. Choice of start year has a large effect: the highest ratios arise using a 1931 start and the lowest from a 1971 start. A 1971 start date gives the largest discrepancy in rates among any of the four tested; there has clearly been a slowdown if one starts there, and there’s just as clearly been an increase if one starts from 1931. It washes out as a draw if one starts from 1951.

The only point of real contention now is the last sentence in the paper: “…based on our new analysis, the IPCC’s (1) statement of two years ago – that the global surface temperature “has shown a much smaller increasing linear trend over the past 15 years than over the past 30 to 60 years” – is no longer valid. Comparing a 30 to 60 year interval with a sub-interval of it, is not the proper comparison. You have to compare non-overlapping intervals, and if you start those from 1971, then yes there definitely has been a slowdown, based on these data.

What’s interesting–and unexpected–to me, is how such a very small difference in the two time series can have such a large impact on the ratio of warming rates in the two time periods. When I first graphed the above, my first thought was “nearly identical, not going to affect the rate ratios much at all”.


p.s. On the NOAA data issue–I noticed that one can in fact get version 4 data from NOAA, just not the spatio-temporally aggregated versions, which as mentioned, have incorrect links to the previous version. You have to be willing and able to do your own aggregating, but if you are, you can get both ASCII and NetCDF format, by month and grid cell.

Karl et al 2015, updated analysis

Update: As mentioned in the piece below, I had suspicions that the data at the NOAA ERSST v4 webpage were not the correct (version 4) data, and it turns out that this is in fact the case. I learned this after Gavin Schmidt of NASA forwarded me data he had been sent directly from NOAA personnel. So, I need to do the analysis below again with the version 4 data, when I get time. Needless to say, NOAA needs to put correct data on their web pages, especially when it forms the basis of a paper making this magnitude of a claim. It’s also telling, and depressing, that of the large number of scientists commenting on this paper, none had apparently gone to get the actual data with which to check Karl et al’s analysis. If they had, they’d have noticed the discrepancy against what Karl et al report.
End Update

And the situation’s worse than I had first thought. Unfortunately.

As I mentioned in the update to previous post, Gavin Schmidt of NASA brought to my attention on Twitter the fact that you cannot average trend rates obtained via Ordinary Least Squares (OLS), between two consecutive time periods, and assume that value will equal the trend computed over the combined periods. I was confused by this at first, but Gavin made a plot showing exactly what the issue is:
OLS trend averaging Nothing forces the two trend lines to be continuous at the junction of the two intervals, and a big jump or drop in the data at or near that junction, with suitably short sample sizes, will cause this to happen. A slight variation of this is just what I did in comparing trend rates over the time intervals that Karl et al discussed. I did so because they did not include the rates, in their Table S1, for all of the intervals they discussed, so I had to make algebraic estimates. [Note that Table S1 is in fact a big problem, as it gives a hodgepodge of time periods that do not in fact allow one to make a direct and clear comparison of rates between consecutive time intervals, the very point of the paper. And their Figure 1 and discussion do not help.]

So, here I compute and report the OLS-derived trend estimates for Karl et al.’s variously defined intervals, and some others they didn’t discuss, using the new NOAA data directly. These intervals are defined by: (1) four start dates (1911, 1931, 1951, 1971), (2) two breakpoint years (1998 and 2000), and two end years (2012 and 2014); this creates 16 possible pairs of time intervals. For each I compute the OLS-derived trend (degrees C per decade) for the two consecutive intervals, and the ratio between them, as the second (“slowdown”) over the first (pre-“slowdown”). [Note that the mismatch issue arises because an OLS-derived straight line is the “stiffest” possible model one can fit to a time series. Using something more flexible like lowess or a polynomial model, would lessen this problem, but then you run into other issues. Since OLS is “BLUE” (Best Linear Unbiased Estimator), and minimizes the possibility of over-fitting the data (very important here!), and since Karl et al used it, I stick with it here. Also, I am only presenting data on the trend estimates themselves, not on their probabilities relative to a null, or any other possible, model. Doing that is trickier and more time consuming than people might think, time I don’t have right now.]

First, the new NOAA annual GMST data, from 1880 on, look like this:

Second, the trend analysis results:

   Start1 End1 Start2 End2 Slope1 Slope2 Ratio
2    1911 1997   1998 2012 0.0685 0.0342 0.499
3    1911 1997   1998 2014 0.0685 0.0560 0.817
4    1911 1999   2000 2012 0.0719 0.0390 0.542
5    1911 1999   2000 2014 0.0719 0.0649 0.903
6    1931 1997   1998 2012 0.0588 0.0342 0.582
7    1931 1997   1998 2014 0.0588 0.0560 0.952
8    1931 1999   2000 2012 0.0650 0.0390 0.600
9    1931 1999   2000 2014 0.0650 0.0649 1.000
10   1951 1997   1998 2012 0.0938 0.0342 0.365
11   1951 1997   1998 2014 0.0938 0.0560 0.597
12   1951 1999   2000 2012 0.1026 0.0390 0.380
13   1951 1999   2000 2014 0.1026 0.0649 0.633
14   1971 1997   1998 2012 0.1584 0.0342 0.216
15   1971 1997   1998 2014 0.1584 0.0560 0.354
16   1971 1999   2000 2012 0.1714 0.0390 0.227
17   1971 1999   2000 2014 0.1714 0.0649 0.379

The table shows several important results. First, there is only one case in which the slope of the slowdown period is as large as that of the pre-slowdown period, given by the eighth row of the table (start = 1931, break = 2000, end = 2014). In two other cases the ratio exceeds 0.90 (rows 5 and 7) and in one more case it exceeds 0.8 (row 3). For the 1951 start date most emphasized by Karl (rows 10-13), no ratio higher than 0.63 results, no matter how the intervals are defined. The mean and median of the 16 ratios are both about 0.56, so there’s no skew issue.

If one’s goal is to compare the data against the trends and discussions given in the AR5, that is, to assess the effects of the bias reductions in the new NOAA data on possible rate changes, then 2012 is of course the end year to use. For the two possible break years, those ratios are given in rows 10 and 12, as 0.365 and 0.380 respectively: the “slowdown” interval rates are thus ~ 35 to 40 percent of the pre-slowdown rates. In the previous post, in which I had to estimate the trend rate in the first interval algebraically, I got estimates of about 65-68%. So…the problem’s actually worse than I first estimated, and by quite a bit, 35-40% instead of 65-68%. And there are some serious discrepancies in my trend values vs theirs. For example, Karl et al report the trend from 2000 to 2014 as 0.116; I get 0.0649. For 1998-2012 they report 0.086 and I get 0.034. The numbers I’m getting are much closer to their reported “old” (bias uncorrected) values, than to their “new” ones, but they don’t exactly match those either. And the data link names provided by NOAA at their ERSSTv4 page do not seem to refer to version 4. So, some problems here.

If one’s goal is instead to assess what the combined effects of the bias reductions and the extra two years of data is, relative to the general question of whether the rate of global warming has slowed over the last ~1.5 decades or not, then rows 11 and 13, which go to 2014, give higher estimates: 60 to 63 percent post/pre. So those two years of extra data heavily affect the answer to that general question. This same basic story holds true for other start dates. And the further back you push the start date, the less the rate difference between the two periods, for any given choice of end and breakpoint dates.

Again, no significance testing here. It is important to remember though, that each year’s GMST value derives from many hundreds of data points, even if reduced to an effective size by accounting for autocorrelation, which certainly exists. I much doubt that any except the very highest ratios given here would prove to be statistically insignificant if the trend rates were computed from those raw data, instead of from the annual means, and even using the latter, they still would not be in many cases.

So, these results seem to make the principal conclusion of Karl et al, that there has been no slowdown in the rate of global warming, even more suspect than I reported in the previous post. The bias reductions and extra years’ data reduce the differences, but they most certainly do not eliminate them.

The R code for the analysis is below

## 1. Get and read data
# The ERSSTv4 GMST data from 1880 on is at:
# Download data and set appropriate working directory
T = read.delim("NOAA ERSST aravg.ann.land_ocean.90S.90N.v3.5.4.201504.asc.txt", sep="", header=F)

## 2. Define time periods; start year = 1951; breakpoint year at either 1998 or 2000, end year at either 2012 or 2014
# breakpoint year always included in *2nd* of the 2 intervals
# 1998 and 2000 useful choices--differing global means
# start years: 40 and 20 years before, and 20 years after, Karl et al's emphasis on 1951; thus covers last 100+ years
start.yr = c(1911,1931,1951,1971); break.yr = c(1998,2000); end.yr = c(2012,2014)
dates = expand.grid(start.yr,break.yr,end.yr)
dates[,4] = dates[,2]-1; dates= dates[,c(1,4,2,3)]; names(dates) = c("Start1","End1","Start2","End2")
which.dates = dates-1879; n = nrow(dates)
intvl.1=paste("int",1:n,".1",sep=""); intvl.2=paste("int",1:n,".2",sep="")
for (i in 1:n){
 assign(intvl.1[i], which.dates[i,1]:which.dates[i,2])
 assign(intvl.2[i], which.dates[i,3]:which.dates[i,4])

## 3. Get OLS-derived lm and slope estimates, ignoring possible AC for now, since no p testing
lm.1 = paste("lm",1:n,".1",sep=""); lm.2 = paste("lm",1:n,".2",sep="")
sl.1 = paste("sl",1:n,".1",sep=""); sl.2 = paste("sl",1:n,".2",sep="")

for (i in 1:n){
 assign(lm.1[i], lm(T[get(intvl.1[i]), 2] ~ T[get(intvl.1[i]), 1]) )		# OLS lm fits
 assign(sl.1[i], get(lm.1[i]) [[1]][[2]])							# slope only
 assign(lm.2[i], lm(T[get(intvl.2[i]), 2] ~ T[get(intvl.2[i]), 1]) )
 assign(sl.2[i], get(lm.2[i]) [[1]][[2]])

Ratios = rep(NA,n); Slope1=Ratios; Slope2=Ratios
for (i in 1:n){
 Slope1[i] = get(sl.1[i]); Slope2[i] = get(sl.2[i])
 Ratios[i] =  Slope2[i] / Slope1[i]
final = cbind(dates, Slope1, Slope2, Ratios)
final = final[order(final[,1], final[,2], decreasing=F) ,]
rownames(final) = 1:n; final
plot(T[,1], T[,2], xlab="Year", ylab= "ERSSTv4 GMST (deg. C) rel. to 1971-2000 mean", type="l", lwd=2)