The paper that prompts this post (and the preceding Introduction to time series analysis is McLean, de Freitas, and Carter, 2009. A reader suggested, in email, that I take a look. I'll recommend that to others as well. I won't carry out all suggestions, not least because I don't know all areas well enough to comment, but they are indeed welcome. And do at times result in a post here. There'll be some following notes as this paper opens several issues. For now, I'll stay with just the paper.
Comments have already appeared at OpenMind, Initforthegold, and Realclimate. In a fundamental sense, I won't be adding anything new. But the approach will differ and might show some features in ways that you might have missed in the comments over there. For instance, I mentioned the crucial bit that I'll be exploring here in a comment at Initforthegold, and Michael missed its significance on first reading. The fundamental was staring him in the face, but fundamentals aren't always easy to notice. When he did, it was 'forehead slap' time.
I've tagged this 'doing science' and 'weeding sources', as well as 'climate change'. Some issues of peer review will show up, as will a flag or two of mine which I find useful in weeding sources. The nominal topic of the paper "Influence of the Southern Oscillation on tropospheric temperature" is climate change. Recently I posted about scientific specificity. While it's entirely true that it doesn't work well to take that line in daily life, it's exactly what one should do with a scientific paper. One thing it means is that we keep an eye on whether the data used, or are used, support the argument that is made.
We start by reading the abstract. As a matter of doing science, the abstract usually makes the most eye-catching statements in the paper. It is the advertising section of the paper, so to speak. You want to say something here that will interest other scientists and get them to read your brilliant work. In this case, "That mean global tropospheric temperature has for the last 50 years fallen and risen in close accord with the SOI of 5–7 months earlier shows the potential of natural forcing mechanisms to account for most of the temperature variation."
SOI is the Southern Oscillation Index. It provides a number that is connected to the El Nino-Southern Oscillation (ENSO), which can then be used for further research, such as this paper. There are different ways of defining an SOI, which might be an issue if the effects the authors were working with were fairly subtle. But, as they are referring to explaining 68-81% of the variance (figure depends on which records are matched, and how large the domain examined is), we've left the realm of subtle. As the authors duly cite, there's nothing new in seeing a correlation between SOI and global mean temperatures. This is well-known. What is new is the extraordinarily high correlations they find, and that eye-catching conclusion that most of temperature variation for the last 50 years is driven by SOI.
For atmospheric temperatures, they use the UAH lower tropospheric sounding temperatures (paragraph 5) and for SOI, they use the Australian Bureau of Meteorology's index (para 7). If the abstract were an accurate guide, we'd expect that with those two time series in hand, they computed the correlations and found those very high percentages of variance explained. Or at least that they were that high with the noted 5-7 month lag. And here's where we get to the time series analysis issue that I was introducing Friday.
Three different things are done to the data sets before computing the correlations. One is to exclude certain time spans for being contaminated by volcanic effects on the temperatures. No particular time series analysis issue here. But the other two both have marked effects on time series. First (para 10) is to perform a 12 month running average. This, as I discussed Friday, mostly suppresses effects that are 1 year and shorter in period. Second is to take the difference between those means, 12 months apart (paragraph 14). As I described on Friday, this suppresses long term variation, and enhances short term variation. They assert that this removes noise, while, in fact, it amplifies noise (high frequency/short period components of the record). Alternately, they are defining 'noise' to be the long period part of the records -- the climate portion of the record.
The combined effect of the two filters is that both the high frequency and the low frequency parts of the records are suppressed. What is left is whatever portion of the two records lie in the mid-range frequencies. To return to my music analogies, what has been done is to set your equalizer in a V shape, with the highest amplitudes in mid-range. While the result has a connection to the original data, it is certainly no longer fair to say, as the authors do in the abstract, that their correlations are between SOI and temperatures.
Demonstration of filter effects -- sample series
The next 4 figures show k) the original time series, which I constructed by adding up some simple periodic functions l) the 12 month running average version m) the 12 month differencing of the original data and n) applying both filters as the authors did (minus volcanoes).
As expected, the running average smoothed out the series. In music terms, it suppressed the treble. That's the job of an averaging filter. The differencing made for a much choppier series than the original. That, too, can be desirable. But certainly the authors' comment about 'removing noise' is ill-founded. If we look at the variance in the time series, the original has a variance of 4.25. The running average decreased that to 2.69 (eliminating 37% of the variance). The differencing increased the variance 50%, to 6.47 (again, increased variance means more noise). Applying both filters produces the final figure, which has little resemblance to the original series. Not least, while the original looks to have a substantial amplitude at a period of 30 years (that appearance is entirely correct, I put in a 30 year period), the final product shows no sign whatever of the 30 year period. That is one of the jobs of a differencing filter -- remove the long period contributions. The filters have also suppressed the 15 year period that I put in, and, in general, turned my original series, which had equal contributions at 5 months, 1, 2, 3, 5, 7, 10, 15, 30 years into something that looks mostly like a 3 year period (count the peaks and divide that in to the time span for them) with a bit of noise.
Filter effects on SOI series
That was a warm up with a test series, where we know that there are no data problems of any sort, and we know exactly what went in. The real data of course have problems (this is always true, and one of the aspects of doing science), but they may not have problems that affect our conclusions. The next figure shows the smoothed (12 month running averages again) and then differenced (as in the paper) Australian SOI (labelled 'both' -- both the averaging and the differencing applied to the original data) (Note that I'm not showing the full curve, only 1950 to present, instead of 1879 to present -- the paper's analyses only covered, at most, 1958-2008).
You see that with both filters applied there are new peaks, missing peaks, and even the sign of the index can change (positive for negative, or vice versa). These are all signs that the filters have fundamentally altered the data set, so that whatever conclusion is drawn can only be drawn about 'data as processed by this filter', not the original data -- in contradiction to the statements in the paper and elsewhere by the authors that it is SOI that explains an extremely high portion of the variation in global mean temperature. Further, since the correlation is largely driven by the peaks, the high correlations can by largely a matter of how the filter creates or destroys peaks rather than the underlying data.
I mentioned Friday the amplitude spectrum -- show the amplitudes of the contributions from each period. Filters change the amplitude spectrum. That's their job. One thing, then, that you do to describe the filter is divide the amplitude at a period after processing with the amplitude before hand (this is known as the response function). An ideal filter will show a 1 for all periods except the ones you're trying to get rid of, where it will be 0. Real filters don't accomplish this, but that's the goal. So, to see the performance of the author's filter, I took their original SOI series, processed it through their filter, and then found the response function in this way. Those are the next figures. First is looking at cycles per year (frequency), letting us see well what happens at high frequencies. Second looks at the period (from 1-15 years).
There are some spikes in the curves, which have nothing to do with the filter. All that is happening there is the these are periods/frequencies which have little signal in the original series, so numerical processing issues can have large effects there (dividing by small numbers is hazardous). But the smooth curve is a fair description. The averaging filter suppresses the signal (response is close to 0) for frequencies of 1, 2, 3, 4, 5, 6 cycles per year. (With monthly data, 6 cycles per year is the highest that can be analyzed -- 2 months period.). The differencing filter also suppresses the very low frequencies (long periods), as we expected even with just the basic introduction from Friday. But take a look between 1.5 and 7 years. The response is greater than the input! Look, too, at the periods which are being amplified. A usual description of ENSO is 'an oscillation with a period of 3-7 years'.
So what do we really have? It isn't a correlation between SOI and global mean temperatures. Both were heavily filtered. What the authors actually compute is the correlation between the SOI time series and global mean temperature -- if you over-weight (response function is greater than 1, so it's an over-weighting) both series towards what is happening in the ENSO periods. The conclusion should really be "If you look only in the ENSO window, you see that ENSO accounts for a lot of variation in global mean temperature." One problem is, that isn't a new result. We already knew that ENSO was important in the ENSO periods. More important to the paper, in so doing, the authors cannot make any conclusion about explaining "most of the temperature variation". They've filtered out much of it, and never examined either the response function nor the effects of their filter on the inputs.
If what was desired was an analysis of global mean temperature response to SOI at ENSO periods, then both the authors should have been clear that this was their window, and they should have used a more suitable filtering process. When one goes back to the paper, it's also clear that no justification was ever made for using either filter, much less both. The filters were arbitrary, and as I've mentioned, we prefer to avoid arbitrary decisions in our papers. If no objective basis for setting up the filters could be found, the authors should have demonstrated that alternate choices did not affect their conclusions.
So, some 'weeding sources', or 'scientific specificity' signs:
* When a paper makes a conclusion about the correlation between A and B, verify that it is A and B that they are correlating.
* If a filter is applied, look for the authors to discuss a) why a filter is being applied at all, and b) why the particular filter they chose was used.
As is my custom, I've sent an email to one of the authors (de Freitas, the only one whose email was given in the paper) about this comment.
Some of the following blog posts will talk about the peer-review aspects that let this paper through. For now, see my old article peer review. One of the other notes (no idea when) will be about how the process continues after a bad paper gets through the peer review process. That is the comment and reply process, and I'll be writing Tamino about that (he's said in his comments that he's preparing a comment for the journal).