But most tempests in blog teapots are about trends. I'm going to swipe an idea from calculus/analysis and have a look at deciding about trends. One of the reasons to take a variety of courses, including ones that may not seem relevant at the time, is to have a good store of ideas to swipe. Er, a strong research background.
As before, I'm going to require that the trend -- to be specific, that the slope of the best fit (in terms of the sum of the squares of the errors being as small as possible) line should become stable in the above sense. This is sometimes referred to as ordinary least squares, and even more breezily as OLS. I don't like that acronyming since I keep reading it as optical line scanner, curtesy of a remote sensing instrument.
There's a little more, however, that we can do. When I looked at averages, I took them centered on a given month or year. So estimating the climate temperature for 1978, say, involved using data from 1968 to 1988. The reason, which I didn't explain at the time, is that if climate is a slowly changing thing, then the temperature a year later tells you as much about this year as the temperature a year earlier. And, as a rule, tells you more about this year than the observations 2 years earlier.
My preference for a centered data span conflicts with what people would generally like to do -- to know what the climate of 2008 (for instance) was during, or at least only shortly after, 2008. On the other hand, you can't always get what you want. The priority for science is to represent something accurately. If you can't do that, then you have to keep working. A bad measure (method, observation, ...) is worse than no measure.
So we have two methods to look at already: 1) compute the trend using some years of data centered on our time of interest and 2) compute the trend using the same number of years of data but ending with our time of interest . I'll add a third: 3) compute the trend using the same number of years of data but starting with the year of interest. (This is the addition prompted by Analysis.)
In numerical analysis, we refer to these as the forward, centered, and backwards computations (we move forward towards the point/time of interest, we center ourselves at the point/time of interest, or we look backwards to the point of interest). For a wide variety of reasons, we generally prefer in numerical analysis to use centered computations. In real analysis (a different field), where one deals with infinitesimal quantities, it is required that the forward and backward methods give the same result -- or else the quantity (I'm thinking about defining a derivative) is considered not to exist at that point. We're not dealing with infinitesimals here, so can't require that they be exactly equal. On the other hand, if the forward and backward methods give very different answers from each other, it greatly undermine our confidence in those methods. If the difference is large enough, we'll have to throw them out.
So what I will be doing -- note that I haven't done the computations yet, so I don't know how it will turn out -- is to
1) take a data set of a climate sort of variable (I'll pick on mean surface air temperature again since everybody does; specifically, the NCDC monthly global figures)
2) for every year from 31 years after the first year of data to 31 years before the last year of data
(I'm taking 31 be able to compute forward slopes for the first year I show over periods as long as that, likewise the 31 years at the end for backwards)
a) Compute forward slope using 3-31 years (for 3, 5, 7, 9, .. 31)
b) Compute centered slope using 3-31 years (meaning the center year plus or minus 1, 2, 3, 4 ... to 15)
c) Compute backward slope using 3-31 years (again 3, 5, 7, 9, .. 31)
a-c) For each, look to see how long a period is needed for the result of the slope computation to settle down (as we did for the average). I expect that it will be the same 20-30 years, maybe longer, that the average took. If it's a lot faster, no problem. If it's longer, then I have to restart with, say, the data more than 51 years from either end.
3) Start intercomparisons:
a) compute differences between forward and backward slopes (matching up the record length -- only look at 3 years forward vs. 3 years backward, not vs. 23 years backward), look for whether the differences tend toward zero with length of record used. If not, likely rejection of forward/backward method. If so, then the span where it is close to zero is probably the required interval for slope determination.
b) ditto between the forward and centered slope computations. The differences will be smaller than 3a since half the data the centered computation uses is what the forward computation also used. Still, I'll look for whether the two slopes converge towards each other. If they don't, then the forward computation is toast.
4) Write it up and show you the results. I'm planning this for next Monday. Those of you with the math skills are welcome (and encouraged) to take your own shot at it, especially if you use more sophisticated methods than ordinary least squares, or other data sets than NCDC. But I'll ask you to hold on putting them to your blogs until after this one appears.
I'll also be providing links to sources (tamino, real climate, stoat, ... and others to be found) which have already done similar if not quite the same things.
Part of the idea here is to illustrate to my proverbial jr. high readers what a science project looks like, start to finish. Some aspects are:
- lay out a method before you start, and consider what it means both if the results are as you expect them to be, and if they're the other way around.
- consider what you'll do if they're different
- look at what other people have already done
- write it all up so that others can learn from what you did