24 July 2009

Introductory Time Series Analysis

As often happens, this note is prompted by someone doing things that look wrong. Time series analysis is something which has been an interest of mine since I earned my Master's (looking at tidal signals in current meter data). And there are some elements which I think are eminently understandable without diving in to the gory details. So here is my shot at a very short, only marginally mathematical, introduction to time series analysis.

A time series is just a series of observations (of something, anything) through time. The monthly averaged global mean temperatures that I use for demonstrating principles of climate are time series. Monthly Southern Oscillation Index is another time series. Not coincidentally, those are the ones examined in the paper, and are what I'll consider here. But there are innumerable other time series -- daily close of the stock exchange, daily temperature, your weight day by day, and so on.

Much of the language for talking about time series is fairly ordinary. But there are a few terms I'd like to be sure we're using the same. Most important is 'period'. The period of something in your time series is the length of time from peak to peak (or trough to trough). Consider hourly temperatures in your area. They peak each day, around, say 3 PM. The period is then 24 hours. If we take a longer view, and consider daily high temperature, then the temperatures peak each year -- a period of 1 year. If we think of the brightness of the moon, it has a period (full moon to full moon) of 29.53 days. And so on. We could also ask how often the peak occurred per year (or other time of interest). This is the frequency -- how frequently do you hit the peaks. For hourly temperatures, the frequency is 365 per year (or 365 cycles per year). The frequency of the lunar cycle is 12.369 cycles per year. And the seasonal cycle has a frequency of 1 cycle per year.

In talking about periods and frequencies, and scientists tend to use the two terms interchangeably since the value of one can always be converted to a value of the other, we sometimes also hear about long/short periods, or high/low frequency. In the periods, it means what ordinary english would lead you to think -- long periods are periods that take a long time from peak to peak, and short periods are fast from peak to peak. You still have to know what is 'long' for the system to read any given paper correctly. If it's a geologist, they could mean 400 million years when they say 'long period' (the period for continental drift cycling), where a meteorologist might mean 40 years. High frequencies correspond to short periods (since the period is short, the thing happens often -- at high frequency). Low frequencies have long periods. The similarity of term with music is no accident. Low frequency sound (low pitch) has a long period, while high frequency sound has a short period.

So when you hit anything relating to time series, if you know about music, think in terms of frequencies and pitches. Another term that comes up in time series analysis, with a slightly different meaning than in music, is 'harmonic'. Unlike music, with its fifths, thirds, etc., in time series we work more simply. There is the base period/frequency. And then there are the integral multiples of that base frequency. The annual cycle's harmonics are 1 (the base period), 2 (6 month period, 2 cycles per year), 3 (4 month period, 3 cycles per year), 4 (you get the idea), and higher. In practice much of our weather time series can be captured by the first 4 harmonics of the annual cycle. (That's interesting in its own right -- it means that there is relatively little happening at periods of 5, 7, 8, 9 months, even though there's a lot at 4, 6, 12 months.)

As with music, we also are interested in how loud the frequency is. Our measure there is called amplitude. It is half the distance from peak to trough. Where I live, our peak summer high temperatures are about 90 F (32 C), and in the winter, our lowest highs are about ... call it -10 C (14 F). The range is 42 C, so the amplitude of our seasonal cycle is 21 C.

Again following music, there's usually more than one frequency being played at a time. This is certainly true for weather! Many different things happening all the time. In music, the description of all the notes the band or orchestra are playing at a time is called the score. For time series analysis, it is the spectrum. A little more involved in time series because we can look at different spectra (1 spectrum, 2 or more spectra) -- the amplitude spectrum, and the 'power' spectrum. Most work is actually done with the power spectrum, but the amplitude spectrum is easier to understand so I'll stay with that.

One of the things we do in looking at climate time series is average the data -- construct a moving average, for instance. The moving average says to take the first (some number, let's say 12) months of data and average them together. Then step forward (move) 1 month, and average the next 12. Repeat until you're at the last 12 months of data. As I've suggested for understanding global climate, you want quite a bit more than 12 months of data in your averaging. But we can also try to understand weather. A 12 month average will clobber most of what is happening shorter than 12 month periods (but not absolutely all of it, a point even scientists seem to forget -- it only completely clobbers the 12 month period and its harmonics), and let us look at what is happening on periods longer than 12 months (but some of that, too, gets damped). In musical terms, averaging suppresses the high notes, while leaving the bass line relatively unaffected.

Suppose what we really want is to suppress the bass line and enhance the treble -- suppress the climate frequencies and focus on the weather. Rather than averaging, which is a smoothing operation that suppresses the high frequencies, we would take differences. Say take the difference between months 12 months apart. We can think of the temperature as being a certain amount of weather, plus a certain amount of climate. The climate will be nearly the same 12 months apart, so when we do the subtraction, it is cancelled out and we have only the difference in weather those two months. Differencing is a sharpening operation that suppresses the bass line and enhances the treble. It is, however, an extremely biased operation -- not only does it suppress some and enhance others, but the degree of enhancement is proportional to the frequency. Unlike the averaging operation, which leaves low enough frequencies unchanged, the differencing affects all frequencies and does so strongly.

Changing time series data is called a filtering. The averaging and differencing operations are different filters. There are many, many more that we could use. Any time we do use a filter, though, we should be careful that it isn't creating problems for us. This is one reason I try always to work with the most nearly original data possible -- no filtering has been done that could obscure the effects I'm trying to work with.

If you're a more visual person than a music person, the spectrum (guess where we stole that word from!) also has some color traits. If the amplitudes are all about the same, regardless of the frequency, then we call it a 'white' spectrum. (White light is approximately equal contributions of all colors.) Instead of period, for light we think of wavelength. Frequency is still frequency, but now it means how many times per meter you see the peaks. High frequency light is blue. Low frequency light is red. When we do the moving average, we are making the spectrum redder. When we do a differencing, we are making the spectrum bluer. Most climate-related time series have red spectra -- there are higher amplitudes at longer periods (wavelengths). Our year to year variations are on the order of 0.1 C, but the ice age cycles are 5 C, for instance.


Jesús R. said...

Then, period and cycle would be the same? And the number of periods/cycles per unit of time would be the frequency? Eg. hourly temperatures in my area -> 24 h period. We have 365 periods per year, so we have a frequency of 365 cycles per year?

And would the spectrum be the graph with the different frequencies in the x axis and the amplitude in the y axis?

I don't get very well the harmonic. Is it the number of cycles/periods per unit of time? (1 cycle per year = harmonic 1, 2 cycles per year harmonic 2...) I don't think so, because that looks the same as the frequency... Or is it an expression for the increase in frequency?

I understand the effects of the differences, but I don't imagine how it is done if it enhances treble. If it is just getting the difference between one year's data and the previous one, I would rather say that you are just looking at the treble, rather than enhancing it...

Does that "white spectrum" have anything to do with the "white noise"?

Don't worry, I think I will get it with the upcoming practical example. :)


Robert Grumbine said...

Period is cycle length. But yes, you have the math right in your example.

You're also right about the axes for an amplitude spectrum.

I hope the article on Monday shows you the treble problem clearly.

White noise is noise with a white spectrum, so yes, very much to do with each other.

Harmonics ... in Monday's figures, look down at the graph for the amplitude response as a function of frequency in cycles per year. It bottoms out at 1, 2, 3, 4, 5, 6 cycles per year. These are the harmonics of the 1 year averaging period. (Nothing really special about harmonics. If we know one frequency, we know the frequencies of all its harmonics. Sometimes that can be useful shorthand.)

Anonymous said...

Very interesting post. Is this basically how the 5-year averages work - the red line on the GISS graph? A good way to get rid of most of the noise. Stats are fascinating.

Thanks for linking to me in the last post :)

Robert Grumbine said...

Yes, that's what the 5 year average is for. I'd prefer a longer period myself, and probably a more sophisticated filter than the simple 5 year average. But its purpose on the GISS (and other) graphs is not to draw hard conclusions, but to provide a general eyeball sense of what the smoothed series looks like. It does that ok.

And you're quite welcome about the mention. Glad to see a new voice putting forward some thoughts about the science, and how nonprofessionals can recognize and approach it.

Hank Roberts said...

It's early to ask for thoughts, but eventually this ought to see publication:

I'd love to see your eventual comments on the blog version and later on how that compares to whatever the journal prints, and on the part of this that has to do with time series.