# On trend estimates, part three

Sometimes when you do science, it’s all immersion in the literature–reading what others have done. Other times it’s about exploring things on your own, maybe out of curiosity, and because you’ve got the tools to do it, maybe because you get some important insights faster that way, sometimes much faster. There are lots of possible reasons. Lately I’ve been exploring trend estimation issues in R for whatever reason, probably because I’ve been doing a lot of work on trend estimation lately.

I’ve discussed the fact that the least squares estimator (“OLS”) for linear regression trend estimation does not always appear to be best when the data are highly autocorrelated. I gave previous examples with an extremely simple estimator, based on the mean of the lag-1 slopes, that gives a better answer in those situations. As Nick Stokes and HaroldW pointed out, this equates simply to the difference, last value minus the first; can’t get much simpler than that. Since then, I’ve done some more exploring on exactly what conditions that one, and a few others, do better or worse than OLS. [Incidentally, one of them is a parabolic weighting as described mathematically by HaroldW, and sure enough, it follows the OLS estimate almost exactly. My thanks to him for pointing that interesting fact out].

This issue is important because it affects confidence in trends in autocorrelated data, of which there are a lot. It’s well known that autocorrelation widens standard errors and confidence intervals around an estimate, exaggerating the statistical significance of the values computed by standard methods. There are ways to correct for this (see note at end on that). However, there is another problem here: the OLS-derived estimate of the most likely (point) value, is itself often thrown off, i.e. it is biased. So, you can correct for exaggerated confidence intervals, but you will often be doing so around a point estimate that is itself mis-estimated. The correction procedures give a higher precision confidence interval, but that interval surrounds a biased value. The value of doing this is therefore questionable.

The good news is that it appears the problem only gets serious when the autocorrelation is toward the high end. Below a lag-1 alpha coefficient of about 0.75 in the AR1 model (x(t) = alpha * x(t-1) + w), the OLS estimator appears to be as good or better than the few others I’ve tested. Above 0.75 however, one or more of these estimators give superior estimates to OLS, as judged by both the accuracy and precision (variance) of the mean estimate. When a random walk state is reached (alpha = 1.0) the difference is substantial and sustained. The most extreme OLS estimates are the ones having the greatest bias. This is true for alpha < 0.75 also, and the gray areas all involve the various possible combinations of lower AR1 alpha values with the higher percentiles of the slopes of the OLS-derived slope distribution.

Here are two results from 1000 simulation runs for alpha = 0.90, at the 5th and 95th percentiles of the distribution of OLS-estimated trends, showing several other trendline estimates as well. Although the spread of the estimates is large, the trends closest to the true value of zero are so consistently. These results can be used to provide more accurate estimates of linear trends in highly autocorrelated data, which is more important than simply increasing the precision of a potentially biased estimate.

One other thing of note. The practice of computing an autocorrelation-corrected probability (p) value, by adjusting the effective degrees of freedom of the residuals, using the formula N = n * (1-r)/(1+r) where n is the observed d.f. and r is the lag-1 autocorrelation coefficient, does not seem to be effective. This method appears to give far higher numbers of statistically significant results than are obtained by simulation of red noise series and observed frequencies of OLS-fitted line slopes.

Correction: The above paragraph is wrong. The formula given above is referred to as Quenouille’s method, dating to the late 1940s. It provides a correction to the p values computed by OLS regression that, though not perfect, will often improve them greatly. I was applying it incorrectly to the F statistic and distribution. See Nick Stokes’ comments here and elsewhere in that thread, for a discussion.

## 2 thoughts on “On trend estimates, part three”

1. Jim,
Could I try to inject some linear algebra here? You have a data vector y which you model as
y=b*x+e
where b is the trend and e the random-ish residuals. x is some known set of values, possibly linear time increments. Assume means of x and y have been subtracted.

Any of the trend estimates you are thinking of can be expressed as scalar product w.y where w is some vector of weights.

Any weighted sum of the residuals is going to reduce because of cancellation. If you can avoid cancellation in summing x, then the estimate
b=(w.y)/(w.x)
improves with more terms, because of that cancellation. Uniform weighting won’t do – sum(x) is zero. But w=x is ideal – terms in w*x are all positive. That’s OLS.

HaroldW’s algebra is really summation by parts. In integration by parts, you express the integral of u*v as the integral of the derivative of u by the indefinite integral of v. Same here, with differences vs cumulative sums. And the cumulative sums of w=x are quadratics.

Now you can express this in terms of a difference matrix D. This has zeros everywhere except 1’s on the diagonal, and a diag of -1’s directly to the left. If you apply this to the y’s you get the differences. Summation by parts just says that
w.y= w.D^-1 *D*y = (t(D)^-1*w).D*y. Here D^-1 is just a lower triangular matrix with all 1’s.

Now D doesn’t have to be a difference operator for that to be true. Very usefully, the off diagonal can have -alpha. Then it is the matrix that converts the residuals of your ar(1) process to white noise.

I blogged about my notions of how to do ar(n) regression here.

• Nick, I think we’re talking past each other. I’m not sure what your point is there exactly. My point is that trend lines computed using different criteria (i.e. different weightings of the first differences) than OLS can return better trend estimates when the ac is high, and these can therefore increase the accuracy and precision of such estimates.