A massive mess of old tree data

I’m going to start focusing more on science topics here, as time allows. I’ll start by focusing for a while on some forest ecology topics that I’ve been working on, and/or which are closely related to them.

I’m working on some forest dynamics questions involving historical, landscape scale forest conditions and associated fire patterns. I just got done assembling a tree demography database of about 130,000 trees collected in about 1700 plots, in the early 20th century, on the Eldorado and Stanislaus National Forests (ENF, SNF), the two National Forests that occupy the mid- to upper-elevations on the relatively gradual western slope of the central Sierra Nevada. The data were collected primarily between 1911 and 1923 as censuses of large plots (by today’s standards, each ~2 or 4 acres) as part of the first USFS timber inventories, when it was still trying to figure out just what it had on its hands, and how it would manage it over time. An enormous amount of work was involved in this effort, but only a small part of these data has apparently survived.

The data are “demographic” in that the diameter and taxon were recorded for most trees, making them useful for a number of analytical purposes in landscape, community and population ecology. They come from two datasets that I discovered between 1997 and 2001, one in the ENF headquarters building, and the other in the National Archives facility in San Bruno CA. For each, I photocopied the data at that time, and had some of it entered into a database, hoping that I would eventually get time to analyze them. For the ENF data, this was a fortunate decision, because the ENF, as I later learned, has managed in the mean time to lose the entire data set, most likely along with a bunch of other valuable stuff that was in the office housing it. I thus now have the only known backup. Anyway, that time finally came, but the data were in such a mess that I first had to spend about three months checking and cleaning them before they could be analyzed. The data will soon be submitted as a data paper to the journal Ecology, it being one of the very few journals that has adopted this new paper format. In a data paper, one simply presents and describes a data set deemed to be of value to the general scientific community. There is in fact a further mountain of data and other information beyond these, but whether they’ll ever see the light of publication is uncertain.

An example first page of one of many old field reports and data summaries involved

We, and others, are interested in these data for estimating landscape scale forest conditions before they were heavily altered by humans via changed natural fire regimes, logging, and grazing (primarily). These changes began in earnest after about 1850, and have generally increased with time. This knowledge can help inform some important current questions involving forest restoration and general ecosystem stability, including fire and hydrologic regimes, timber production potential, biological diversity, and some spin off topics like carbon dynamics. They can directly address some claims that have been made recently regarding the pre-settlement fire regimes in California and elsewhere, in certain papers.

The data assembly was much slower and more aggravating than expected–I won’t go into it but I’ll never do it again–but the analysis is, and will be, very interesting for quite some time, as much can be done with it. Some of the summary or explanatory documentation associated with the data is entirely fascinating, as is some of the other old literature and data that I’ve been reading over as part of the project. In fact I’m easily distracted into reading more of it than is often strictly necessary, but so doing has reminded me that a qualitative, verbal description can be of much greater value than actual data, scientific situation depending. Possibly the most interesting and important aspect to this is the degree to which really important information has been either lost, completely forgotten about, or never discovered to begin with. This is not trivial–I’m talking about a really large amount of detailed data and extensive, detailed summary documentation. Early views and discussions regarding fire and forest management, and the course these should take in CA, are extensive and very revealing, as we now look back 100 years later on the effects of important decisions made then. There are also lessons in federal archiving and record keeping.

I’ll be posting various things as time allows, including discussions of methods and approaches in this type of research. I’m also applying for a grant to cover the cost of free pizza at the end, although to be honest I’ve not had great success on same in the past. You might be surprised at the application numbers and success rates on that kind of thing.


4 thoughts on “A massive mess of old tree data

  1. Great work Jim, this kind of archaeo-data mining is absolutely vital for gaining that long term perspective. There must be huge amounts of this kind of information sitting rotting in old archives.

    • Hey thanks Jeff, appreciate the good word. Yes I agree that it is very important work and not nearly enough of it is getting done. Valuable archives are disappearing over time, I’ve experienced it first hand. I think the incentives/rewards in science work against it–lots of work and not much reward. It’s a big problem.

  2. I have to agree with Jeff… this looks like a pretty cool find and your efforts to rehabilitate the data should bear some interesting fruit.

    Are there time series measurements (repeated measures on plots); and are there data like tree ring measurements (potentially from other surveys) that might serve as correlates or benchmarks?

    How much “end of project” pizza are we talking about here? If we’re not trying to feed too many, and if the lab happens to be in central Ohio at the right time, give me a shout and we’ll see if we can’t oblige.

    • Hey Clem, thanks for the good word.

      No time series involved here–these plots were apparently censused only once, with the exception of 20 that I re-measured about 10 years ago in Yosemite. I will be pushing however to get some of the plots established as permanent, long-term monitoring plots. However, for the existing data, everything revolves around spatial analysis here, as these plots are spread over a wide area, including a large elevation gradient (about 5500 feet; this is rough mountainous terrain). I will however be doing some time series analysis to estimate tree ages at the time of data collection, based on growth rate data obtained from various early sources, which I also discovered during this whole process.

      Yes, there are several other data/information sources that can be used in comparisons, but there is no directly comparable data set as this was a unique and unparalleled effort in the history of American forest inventory. The most important of these are the General Land Office bearing tree (BT) data from the 1880s, which are distance-based (plot-less) tree measurements. Those data require special distance-based statistical methods which are tricky and easily misunderstood, and as a consequence the BT data have been badly analyzed in past few years, and in very prominent ecology journals, to make highly dubious arguments regarding pre-settlement forest conditions in the Sierra Nevada, and associated natural fire regimes.

      Oh and if the pizza grant falls through I’ll be calling on you to bring some roasted soybeans, with pepperoni preferably.

