This graph shows three trend lines fitted to a standard sine function over one full period (x axis units are pi/16 radians). A sine function is used because it gives a particularly tidy example of highly autocorrelated data having strong internal variation but no actual trend. Non-periodic, but still highly autocorrelated, data will frequently show a similar general shape to this (see the last graph of the previous post on this topic, for example), in the sense that positive values tend to cluster together, as do the negatives.

Clearly, a standard OLS trend line fit to such a series gives a seriously wrong trend estimate. Tukey’s “robust” method (function “line” in R) is even worse. An average of the slopes at lag 1, by contrast, gives a perfect estimate. Why? Because the standard OLS method optimizes an “objective” function that minimizes the sum of the squared deviations from the trend line, solved analytically. The problem is that *that’s the wrong objective function* when data are highly autocorrelated. One should instead minimize the sum of the departures of some set of slope lines (= lag-x differences), taken from the data, from a central tendency thereof. Which is to say, take the mean or median thereof.

When the autocorrelation in the data is due mainly to the existence of a real trend, the latter method will return an estimate not greatly different from that of OLS methods. But when the autocorrelation is instead due to series that are approaching a random walk status (AR(1) -> 1.0), but with no real trend, it will be far less biased than OLS (or Tukey’s method). That fact might have some importance.

*Updated comment*. To clarify a little on the cause of the OLS mis-estimates, it’s not actually the autocorrelation itself that causes the OLS-derived mis-estimates. It just increases the frequency of them (in proportion to the ac level). When noise is purely white, OLS still gets it wrong p percent of the time for any given p value. When noise is red, OLS gets it wrong more often, and more severely.

*Comment #2*. Note that the “mean slope” estimate is always superior to the OLS estimate, regardless of the phase of the sine wave, with the exception of phases starting at 0.5*pi and 1.5*pi, where they are equal.

Jim,

I don’t think this example is a valid criticism of OLS. Your preference for the red line is based on your knowledge of how the curve continues beyond the range. Without that, it doesn’t seem like the best fit at all. And no general algorithm can incorporate that kind of knowledge.

In fact, OLS does average the slopes. It just uses a weighted average that upweights the central values. It isn’t obvious that in general uniform weighting is better.

Hey Nick, thanks for the comment.

I can see how it might appear as you state, because I used a sine function to illustrate the point, which we know of course continues on ad infinitum. But the red line slope is

notin fact based on this at all. The superior trend estimation occurs just as well if I use a simple first order autoregressive model, (x(t) = a * x(t-1) + w), especially as a approaches 1.0 from below (though it will not be perfectly unbiased, as in this case, because of random error, w). That’s what the graphs in the previous article are exactly showing. A general algorithm based on slope averaging (not necessarily at lag-1) is generally superior to OLS, I’m pretty sure of it.I’ve never heard of OLS averaging any slopes. My understanding has always been that it simply minimizes the squared deviations from the trend line, by setting the first derivative to zero. But however it works, it clearly produces a wrong estimate in cases like this. Just as troubling is that the “robust” method of Tukey is even worse.

For the simple case of a time series sampled at equal intervals, Nick’s assertion that OLS trend is a weighted average favoring the central values can be easily shown. [And it is also true that OLS minimizes the sum of squared vertical deviations from the trend line.]

The OLS trend estimate is a linear combination of the y-values. Denoting the sample indices as 0 to N-1,

the OLS trend estimate (per sample interval) =

SUM{over i} W(i)y(i), where the i’th weight is

W(i) = ( i – (N-1)/2 ) / C,

with C=scale factor= (N)(N-1)(N+1)/12.

One can algebraically transform this into a linear combination of the first differences (= “slope at lag 1”).

Defining a length-(N-2) first difference vector by:

d(i) = y(i+1)-y(i) for i=0 to N-2,

OLS trend = SUM{over i} U(i)d(i),

where U(i)= ( (N/2)^2 – (i – (N-2)/2)^2 ) / C2,

with C2 = scale factor = (N)(N-1)(N+1)/6 = twice the C from the first formula.

The {U(i)} satisfy SUM{over i} U(i) = 1.

That is, the OLS trend is a weighted average of the first differences, with the weighting function being an inverted parabola, having maximal weight at the center, tapering off towards zero at the edges.

Thanks Harold, that’s interesting; I’ve never seen OLS explained in that way (and thanks for the clear notation btw). So this weighting scheme is a natural consequence of setting the first derivative of the squared differences to zero and solving for b?

Hi Jim,

First, glad you could understand the notation; I was afraid that without being able to express equations in graphic form, things wouldn’t be clear. Also, the line breaks didn’t come out the way they looked at entry.

The first equation is derived from the minimization criterion. The derivation is a little briefer using matrix notation, but easier to write in text using summation notation, so I’ll stick to that.

The sum of vertical error squared is:

E = SUM{over i} [ y(i) – (m x(i) + b) ]^2.

Setting the partial derivative of E with respect to m to zero yields

SUM x(i)[ y(i) – (m x(i) + b) ] = 0, or

Am + Bb = C,

where A = SUM x(i)^2, B = SUM x(i), C=SUM x(i)y(i)

Setting the partial derivative of E with respect to b to zero yields

SUM [ y(i) – (m x(i) + b) ] = 0, or

Bm + Db = F,

with D = N (sum of 1’s), and F = SUM y(i).

Solving the two linear equations in m and b gives

m = (DC-BF)/(AD-BB)

b = (-BC+AF)/(AD-BB)

Only F and C contain the {y(i)}, so one can see that the result is linear in the {y(i)} as one would expect.

The formulas become simpler if one symmetrizes, introducing x'(i) = x(i) – xbar, with xbar equal to the average of the {x(i)} = SUM x(i)/N. This forces the off-diagonal term B to zero, that is,

B = SUM x'(i) = 0.

Using the primed x-variables changes the intercept but leaves the slope unchanged.

In the new co-ordinate system with B=0, the equation for m reduces to

m = C/A = SUM [x'(i)y(i)] / SUM [x'(i)^2]

For the case of {x(i)} = {0, 1, …, N-1}, xbar = (N-1)/2. The A term is evaluated as

A = SUM x'(i)^2 = (N)(N-1)(N+1)/12

m = C/A = SUM x'(i)y(i) / A

= SUM W(i)y(i)

with W(i) = ( x(i) – (N-1)/2 )/A

as written above (with a change in the name of the scale factor).

The conversion to the 2nd equation (weighted sum of first differences) I haven’t seen in textbooks. I had also noticed, as Nick has remarked, that the unweighted average of first differences cancels out all y-values but the first and last.

It’s not possible to generate an equivalent formula from the set of N-n lag-n slopes. The lag-1 formula “works” because one can reconstruct the {y(i)} from the N-1 first differences, up to an arbitrary additive constant y(0) which doesn’t affect the OLS slope. With second differences, there are two arbitrary constants, say y(0) and y(1); the OLS calculation will vary with y(1)-y(0). This is only to say that a lag-n average isn’t mathematically identical to the OLS slope. Yet it used — for example, one can recast the process of computing monthly anomalies, and then calculating the OLS slope on the anomaly series, as a weighted average of the lag-12 slopes.

Thanks again Harold. What’s funny about this is that just yesterday I was asking myself “what are those least squares equations again exactly?” but was too lazy to look them up.

Jim,

Just one other comment on averaging the slopes. If you use a uniformly weighted average with uniform spacing, the slope you get will be just what you’d get from connecting the first and last point (think of what you get when you sum all those differences). The other data doesn’t contribute. That’s not good.

Exactly correct Nick; the resulting slope will therefore be highly dependent on the first and last values, and many times when the OLS based trend estimate is ~ correct, the mean slopes-based estimate will over-estimate it. However, if you use longer lags, this problem will be ameliorated. I’m exploring that issue right now. I just used lag-1 as a first cut example.

The other point on that point is that taking the median slope, rather than the mean, can also help significantly.

I edited Harold’s two comments above to make the lines break better.