Severe analytical problems in dendroclimatology, part five

OK, posting code here doesn’t work; let’s try this again.

Last week after I put up part four of this series, there was some discussion regarding demonstration/specification of the claims presented there.  I gave it some thought and decided that, since I’ve already described the fundamental issues involved, and framed the approach taken, the cat is really already out of the bag, so I might as well demonstrate it with some code. As mentioned, I could not put up the full code, nor simply extract a snippet from the jungle that it is, so I wrote something much briefer that does the job (I think).  Perhaps it was a needed exercise anyway, to prove to myself that some wires didn’t get crossed somewhere. Discussion is welcome.

The code is here. Please heed the notice at the top!


update 1: I’ve made a few improvements in the code comments and very minor changes in the graph output
update 2: Couple of graphs added, using a = 0 and b = 0.1, for sake of reference/discussion:
update 3: A few more code improvements made, code for a second graph added, other clarifications etc; probably leave it at that.
update 4: a few more changes, as noted in the code, and revised graphs here

Graph5True yearly climate values with linear trend estimate.

Graph6RCS-estimated values and trend estimates (linear and stiff loess).

33 thoughts on “Severe analytical problems in dendroclimatology, part five

  1. You could upload the R code to an ftp site, and post a URL here. Rapidshare is a free site I’ve used in the past (albeit rarely). I’m sure there are many others.

    • I forgot to mention, that for those new to R, the package “dplR” has to be installed before the script will run. Package installation is easy in R: choose packages:install package(s)… from the top menu bar, choose a CRAN mirror at the prompt, and then scroll through the (long) alphabetical list of packages and choose dplR. It will install automatically. It is then called and loaded from within the code.

  2. The code is well commented and does indeed match your verbal description–except for the discrepancies in that description that prompted me to attempt to elicit clarification by restating it mathematically last week.

    Two things that don’t stand out for me as well as they might, though, are (1) whether you are arguing that the trend difference results from some failure of the conventional RCS procedure to recognize the inverse-square-root relationship between ring width and tree age (or, more properly, and tree cross-sectional area) and (2) the degree to which the trend discrepancy detracts from the reconstruction that will be based on the chronologies.

    Just in case you are making that first argument, I’ll mention that it’s not clear to me that the RCS curve misses the inverse-square-root relationship; it appears merely to have erroneously superimposed a trend on that relationship–which does not appear startling; one would expect that a trend uniformly present in the RCS “training” would to some degree be suppressed by application of the resultant curve to the raw data, independently of whether the trees exhibit the inverse-square-root relationship you have pointed out. In fact, I guess I was surprised that the suppression isn’t greater.

    As to the second point, a reconstruction based on the trend-suppressed chronologies may, for all you’ve shown so far, do just fine, thank you, at least for times that do not precede the in-sample period by too much. Or perhaps you plan in your next installment to prove just the opposite.

    • Thanks Joe.

      Two things that don’t stand out for me as well as they might, though, are (1) whether you are arguing that the trend difference results from some failure of the conventional RCS procedure to recognize the inverse-square-root relationship between ring width and tree age (or, more properly, and tree cross-sectional area) and (2) the degree to which the trend discrepancy detracts from the reconstruction that will be based on the chronologies.

      The main point so far is simply to demonstrate the basic fact that real trend mis-estimates do in fact arise–this of course has to be step one. As to why this occurs, “yes” to your (1), it fails to properly recognize that fact. As to your (2) I mention in the code comments that no calibration/regression, as typically performed, is implemented here, because that step neither adds, nor subtracts from the trend error introduced by the detrending step. So the answer is “a lot”, as is made clear from the graphs.

      Just in case you are making that first argument, I’ll mention that it’s not clear to me that the RCS curve misses the inverse-square-root relationship; it appears merely to have erroneously superimposed a trend on that relationship–which does not appear startling

      It definitely misses it, and badly, that’s the point. How can you say that such a large trend difference is “not startling”–it’s enormous, and it only gets worse as one departs from the ideal growth conditions used here. Are we looking at the same graphs?

      one would expect that a trend uniformly present in the RCS “training” would to some degree be suppressed by application of the resultant curve to the raw data, independently of whether the trees exhibit the inverse-square-root relationship you have pointed out. In fact, I guess I was surprised that the suppression isn’t greater. As to the second point, a reconstruction based on the trend-suppressed chronologies may, for all you’ve shown so far, do just fine, thank you, at least for times that do not precede the in-sample period by too much.

      Why would one have any reason to expect what you claim? Test your idea: use the last 100 years as a pseudo-instrumental/calibration period, to estimate a linear relationship between the ring predictor and the climate predictand, and then see what trend it predicts over the full period.

  3. These things quickly get too complicated for this mode of communication, so I’ll initially deal with only one of the issues, i.e., whether the RCS process “misses” the inverse-square-root relationship.

    By saying it doesn’t, I simply mean that the standardization curve does indeed look like an inverse square root, with a trend imposed. I’ll attempt to explain that further by attaching code, but, against the possibility that WordPress will spit it out, I’ll place the code in a separate response–and it’s possible that this paragraph makes the code unnecessary.

  4. For what it’s worth, here’s the code:

    # Line up the standardization curves.
    ageAligned = matrix(nrow =nrow(rw), ncol = ncol(rw)); # Line up same-age trees
    for(k in 1:ncol(rw)){ageAligned[1:(nrow(rw) – starts[k] + 1), k] = stdcurve[starts[k]:nrow(rw), k];}

    # Get the overall standardization curve by averaging:
    stdcurveavg = apply(ageAligned, 1, mean, na.rm = TRUE);
    plot(stdcurveavg, type = “l”);

    # Compare the result with an inverse-square-root curve
    lines(mean(stdcurveavg[100:500]*sqrt(100:500 – 1/2)) * (seq(1:500) – 1/2)^(-1/2), col = “red”, lty = 2);
    # They should be similar: the process has captured the inverse-square-root curve but superimposed a trend
    # If you repeat the process with the opposite climate trend, you’ll see the same thing but the opposite offset

  5. Oops! I left out the code’s first line, which is:

    stdcurve = rw/detr.rcs.flex; # Each tree’s start-time-offset standardization curve

  6. “Why would one have any reason to expect what you claim? Test your idea: use the last 100 years as a pseudo-instrumental/calibration period.”

    I’m unlikely to get the time to do that soon. However, last week, before I installed dPlr (which I did today), I wrote my own RCS routines and treated individual trees as chronologies (I know, not the same thing), and, depending on which of my RCS routines I used, the results were creditable back to a couple hundred years before the final hundred, in-sample years. But these were just a back-of-the envelope types of assessment, and there was obvious overtraining: the chronology blew up in the earlier years.

    What I would do if I had time would be, in addition to creating an ensemble of chronologies with which to do the reconstruction, would be to perform the same reconstruction operation, but this time with synthetic ring-width data based on the of a linear relationship between sensitivity and age in order to see how that change affects the reconstruction performance.

    But, as I say, I’m just a lawyer, not a scientist, so it may be a good idea for some of the scientists out there to try that instead.

    • I’m unlikely to get the time to do that soon.

      It only takes a few minutes, just compute a linear regression of the climate state on the RCS estimates for the last 100 years and make the prediction. In fact, this is one case where you really can eyeball it and clearly see it’s going to give an inaccurate trend.

      However, last week, before I installed dPlr (which I did today), I wrote my own RCS routines and treated individual trees as chronologies (I know, not the same thing), and, depending on which of my RCS routines I used, the results were creditable back to a couple hundred years before the final hundred, in-sample years. But these were just a back-of-the envelope types of assessment, and there was obvious overtraining: the chronology blew up in the earlier years.

      Why would you do that before first examining the ARSTAN and dplR methods? And no, not the same thing at all.

      But, as I say, I’m just a lawyer, not a scientist…

      Ah, I see.

  7. This blog will be too technical for me, so I will leave after one comment: I spent several months on a Border sheep farm (Netherwitton, Northumberland), and the working collies and their adolescents were far and away the best, most alive, dogs I’ve ever met.

    • They are the best aren’t they! And there will be plenty of non-technical articles here, you can count on it (and a few now in fact). I just need to get this technical stuff out right now.

  8. I tried to steal a few moments this morning to do a reconstruction, but it has now extended to over an hour. Be that as it may, I did finally get one done, but I’ll have to put off doing others with different parameters until later.

    The sole reconstruction I’ve done is almost indistinguishable from the climate variable in the 100-year period that immediately precedes the training period, although there is a just-detectable difference in their trend lines’ slopes: the reconstruction’s slope is slightly lower. Farther back in time, the match deteriorates, with the reconstruction failing to follow peaks very well and the lower-magnitude trend continuing so that, out of the underlying climate-ramp values, which climb from 3 to 6), the reconstruction’s trend line exceeds the climate’s by about 0.6 at the beginning of the reconstruction. In short, although there was indeed some trend suppression in the reconstruction, it wasn’t particularly pronounced.

    I might mention, though, that I was using my code, not yours, and that code used one core for each of fifty trees, required a 25-trees minimum instead of your 6-core minimum, and imposed a minimum age of 15 instead of 3. I’ll change those parameters (except for the 1 core per tree) when I get around to doing more reconstructions.

    Incidentally, you may want to check your code. In the rcs() call, your code has “nyears = (2/3)*nyrs” in one case and “nyears = (1/10)*nyrs” in the slot for the “nyrs” argument, whose default value is NULL. It may be that you end up with the same, NULL value for nyrs in both calls, which I’m guessing is not what you wanted.

    • The first thing you need to do is explain why you’ve decided to invent your own RCS algorithm in the first place, to do these various evaluations you’re doing. The two relevant points in that regard are that (1) dplR and ARSTAN produce very similar results, and (2) both were developed by smart and eminently capable people. It’s not at all clear what you’re doing, or why.

      The sole reconstruction I’ve done is almost indistinguishable from the climate variable in the 100-year period that immediately precedes the training period, although there is a just-detectable difference in their trend lines’ slopes: the reconstruction’s slope is slightly lower. Farther back in time, the match deteriorates, with the reconstruction failing to follow peaks very well and the lower-magnitude trend continuing so that, out of the underlying climate-ramp values, which climb from 3 to 6), the reconstruction’s trend line exceeds the climate’s by about 0.6 at the beginning of the reconstruction. In short, although there was indeed some trend suppression in the reconstruction, it wasn’t particularly pronounced.

      You’re definitely doing something wrong. The high frequency (year to year) variations in particular should closely match reality.

      Incidentally, you may want to check your code. In the rcs() call, your code has “nyears = (2/3)*nyrs” in one case and “nyears = (1/10)*nyrs” in the slot for the “nyrs” argument, whose default value is NULL. It may be that you end up with the same, NULL value for nyrs in both calls, which I’m guessing is not what you wanted.

      OK, now that is helpful, thanks. The RCS “nyrs” argument was indeed misnamed, such that both the stiff and flexible spline fits were defaulting (to the flexible fit, 0.10 of the full timespan). I could not figure out why I was getting identical results for the two different splines in this new code (one shouldn’t) but not in my original code and results (where the naming was correct). See the notes in the updated code–and also I’ve replaced the two graphs in this piece.

      Also, the number of trees and cores per tree doesn’t matter, nor the number of ring measurements to ignore in the early years of each core. In fact, I changed the code slightly so that one doesn’t have to ignore any of them.

  9. Actually, I was wrong about the trends; when I compare the reconstruction trend suppression to the RCS trend suppression, they’re about the same size; so I’ve actually shown that the RCS trend suppression does indeed carry over into the reconstruction.

    Now to change the tree-growth curve to determine whether that is what’s responsible for the trend suppression. I’m guessing no, but I’ve been wrong before.

    • Actually, I was wrong about the trends; when I compare the reconstruction trend suppression to the RCS trend suppression, they’re about the same size; so I’ve actually shown that the RCS trend suppression does indeed carry over into the reconstruction.

      What?? “carries over into the reconstruction”? The “reconstruction” is the very thing we’re focused on here.

      Now to change the tree-growth curve to determine whether that is what’s responsible for the trend suppression. I’m guessing no, but I’ve been wrong before.

      Nope, it isn’t. Instituting a biological trend in the tree growth only makes the problems worse. The “no biological trend” is one of the idealized conditions tested here.

  10. Regarding posting source code on wordpress:

    http://en.support.wordpress.com/code/posting-source-code/

    For example (hope this works):

    # 5. RCS (Regional Curve Standardization) using package dplR (Bunn et al, 2012)
    library(dplR)	 # dplR must be installed first
    corenames <- colnames(rw)	
    offsets <- data.frame(corenames,rep(1,ncol(rw)))	 # dplR requires minimum pith offset values = 1; so do, i.e. no mis-estimation of ring age
    colnames(offsets) = c("ID","offset")
    
  11. You’ll want to check this, but I think I may have been right in suspecting that trend suppression is inherent in the RCS process itself, independently of whether it is the area increment or the radius increment that is linear in the climate variable.

    Let me try to say that more precisely:

    In your simulation, you are assuming that the rate of tree-cross-sectional-area growth responds more or less linearly (within a narrow range of climate-variable values) to the climate variable , i.e., that the sensitivity of ring width to the climate variable is inversely proportional to the square root of that area. And, if I understand you correctly, it is this feature of tree growth that you consider responsible for the fact that reconstructions based on a record having an overall trend will tend to suppress that trend. I suppose that ,in accordance with this theory, If it were the rate of ring-width growth rather than the rate of area growth that responded linearly to the climate variable, then the trend suppression would not occur.

    Now, I’m putting words in your mouth here; perhaps you didn’t mean that at all, in which case the following observation will be inapposite: When I grow the trees in accordance with dR/dt = m * C + b, where R is the tree radius and C is the climate variable, I get at least as much trend suppression in the reconstruction as I did when I grew the trees in accordance with dA/dt = m * C + b, where A is the tree area. That is, the magnitude of the climate-variable-value trend line’s slope I infer from the tree-ring widths still is less than the slope magnitude in the “actual” climate-variable values on which I based generation of those tree-ring values, and the discrepancy is every bit as great as it was in the dA/dt = m * C + b case.

    • You’re not getting a few things here.

      I appreciate your interest, but your various assumptions about what I have and have not said and concluded, along with insinuations that I’ve gotten it wrong, and that a hobbyist with a few hours to kill is going to show me exactly why, among other things, has let’s call it, a “tendency to induce aggravation”. So let’s back up a few steps OK?

      The goal here is to proceed methodically through the technical points of these issues I’ve raised in this series, so that as many people as possible can (hopefully) follow along. It should be apparent, that having now written four increasingly technical pieces, that I likely have not said everything that I’m going to say on these topics, especially on this second issue of the three.

      So one step at a time, I’ll be getting to the issues you’ve raised here. For now I’ll just say that (1) that the problem is inherent to the RCS algorithm is exactly what I’m saying and (2) you’re not recognizing that one needs to clearly distinguish between geometric and truly biological effects in accounting for possible biases induced by the detrending process, and no matter what biological relationship relationship between climate and ring response is posited, the detrending algorithm cannot definitely discriminate the non-climatic from the climatic, in terms of long term trends.

  12. Jim Bouldin: “The first thing you need to do is explain why you’ve decided to invent your own RCS algorithm in the first place, to do these various evaluations you’re doing. The two relevant points in that regard are that (1) dplR and ARSTAN produce very similar results, and (2) both were developed by smart and eminently capable people. It’s not at all clear what you’re doing, or why.”

    Actually, I have been using dpIR ever since you posted your code. I had used a quick-and-dirty approach I knew I’d have to change, and dpIR saved me the trouble. But I saw no reason to change the rest of my code, in which I generate a temperature sequence and grow trees in response, since it has a a bunch of parameters I can use to vary things conveniently when I get around to it.

    Since then I added code to generate multiple sets of tree-ring data from the same climate data independently, make a respective chronology from each set, use, the last, say, 100 years of that ensemble of chronologies to train a linear model, and then use that model to base a reconstruction of that climate variable’s previous-year values on the chronology ensemble’s previous-year values.

    Jim Bouldin: “What?? “carries over into the reconstruction”? The “reconstruction” is the very thing we’re focused on here”

    Well, ultimately, yes, but the code you posted merely creates a chronology from a stand of trees. Yes, you’d hope that the chronology matches the temperature in the idealized case, and it largely does, except for the discrepancies you’ve pointed out. But the posted code doesn’t then use some sub-record of the climate variable and of a plurality of such chronologies to compute respective weights by which to attempt to reconstruct the remainder of the climate-variable record from the remainders of the chronology records. That’s the reconstruction that I was talking about.

    Jim Bouldin: “You’re definitely doing something wrong. The high frequency (year to year) variations in particular should closely match reality.”

    Actually, the high-frequency components in my chronologies do indeed match “reality,” and those components in my reconstructions’ later out-of-sample years do, too. But in the reconstruction the fidelity deteriorates as the year gets farther removed from the training period. So, although I will be very surprised if I haven’t done something wrong somewhere, that I have is not evidenced by a failure of my chronologies’ high-frequency components to match the climate variable’s.

    • There’s no way we can have a productive discussion with you continually referencing your mystery code, not knowing what it is or why exactly you’re doing what you’re doing, especially given the problems with what little you did post earlier. If I can’t follow you, I’m pretty sure that very few others can either. It would be better for everyone if you just asked questions or made comments about what I’ve presented.

      Yes, you’d hope that the chronology matches the temperature in the idealized case, and it largely does, except for the discrepancies you’ve pointed out.

      NO IT DOES NOT. Why do you say such a thing, in direct and obvious contradiction to the evidence presented that everyone can clearly see? The long term trend is clearly, and badly, mis-estimated.

      But the posted code doesn’t then use some sub-record of the climate variable and of a plurality of such chronologies to compute respective weights by which to attempt to reconstruct the remainder of the climate-variable record from the remainders of the chronology records. That’s the reconstruction that I was talking about.

      Right, and it also doesn’t solve the problem of world hunger or cure Down’s syndrome, which surprisingly enough, I never even brought into the discussion.

      Actually, the high-frequency components in my chronologies do indeed match “reality,” and those components in my reconstructions’ later out-of-sample years do, too.

      That’s not what you said earlier when you stated: “Farther back in time, the match deteriorates, with the reconstruction failing to follow peaks very well

    • I’d be happy to post the code, but the part that corresponds to Mr. Bouldin’s posted code is actually no different substantively from the code that’s posted–except that (1) it includes a number of additional bells and whistles, most of which I haven’t tried out yet and therefore likely still need some debugging, (2) it’s not commented nearly as well as Mr. Bouldin’s, and (3) I haven’t yet gone through and pruned out the codes snippets I don’t use. I wrote most of it before Mr. Bouldin changed his mind and posted his, and I replaced one section of mine with one section of his. In short, it would probably be confusing.

      I know that this is contrary to what one might infer from Mr. Bouldin’s reactions, but I think that if you read my comments carefully you’ll find it’s consistent with what I actually said. (You may also want to compare the math and code I did post with Mr. Bouldin’s comments about them and draw your own conclusions.)

      Anyway, I don’t think posting my code would be helpful. Still, if anyone really thinks he would benefit–and I question whether he would–I’ll post it.

  13. Jim Bouldin: ” If I can’t follow you, I’m pretty sure that very few others can either.”

    Yes, this is all likely confusing to the observer. Although I think much of the problem stems from Mr. Bouldin’s seeming inability to take yes for an answer and his failure to read what I said carefully, the fact is that I bear considerable responsibility, too, since I (1) unhelpfully injected a red herring caused by my too-cursory eyeballing of my early results and (2) somehow became possessed of the erroneous notion that Mr. Bouldin was blaming RCS shortcomings largely on failure to take into account ring-width growth’s dependence on tree size: in that case, it was I who misinterpreted what he was saying. So, for the benefit of any observer whom my remarks (or, more likely, Mr. Bouldin’s responses to them) may have given the wrong impression, I will state that nothing in my replication efforts gives me any reason to disagree with anything in the bodies of (as opposed to some of his answer responses in) this series of five posts.

    To the extent that it is accurate—and I have no reason to suspect it is not—I found this series of five posts be a very valuable and clear summary of one dendroclimatology-art aspect, namely, the generation of “chronologies,” from which workers in this field attempt to infer the values of temperature (and possibly other climate variables) that prevailed before the thermometer-record era. In addition to getting an overview generally, I was particularly happy to learn (1) what is meant by a “chronology,” (2) that it is ring area rather than width that tends to be more directly proportional (in limited ranges) to temperature in temperature-limited-regime trees, and (3) that the dplR package for R exists and how to use it.

    In particular, I have found that, as represented by his posted simplified code, his simulation method is in fact well described by the fourth post’s verbal presentation. Just in case his references to the seemingly unnecessary use of more than one core per tree in the simulation make you, as they did me, question whether I had understood his description correctly, I’ll confirm that, yes, his code does employ that feature—even though adding an additional core to each tree has no effect at all other than computational cost on the simulation as presented in the posted code. (However, using an additional core per tree presumably would have an effect if, e.g., the inter-tree growth-rate variation mentioned in connection with Part 4’s Fig. 3—but not included in the posted code—were actually an inter-core variation.) Also, do not be led astray, as I was, by his statement that my mathematical description of the tree-growth operation is incorrect; it isn’t. His code (in its Part 4b) differs from my math only in the former’s inclusion of a factor of pi, which is computationally unnecessary but helpful from a code-documentation standpoint. (That is, rather than being “unnecessarily complex,” my math was actually very slightly simpler than what he used.)

    However, I ended up confusing the issue by unadvisedly making the speculation—based on too-cursory an eyeballing of my results—that a subsequent reconstruction of the climate variable based on chronologies prepared by the RCS methods this series of posts describes might end up being accurate despite the shortcomings those posts point out. As Mr. Bouldin correctly implied, there is no reason to expect such a result, and, as soon as I looked more closely, I immediately reported that in fact I had not observed it. (Unfortunately, Mr. Bouldin would not take yes for an answer, and an unedifying colloquy has followed.)

    In short, I found the main posts in this series an excellent can opener into the subject for a layman like me, and, after replicating the work, I have found almost nothing inconsistent with what the main posts say, despite what one might infer from Mr. Bouldin’s reaction.

Leave a reply to Joe Born Cancel reply