30 September 2009

Assessing predictions

It's a little premature to make a detailed assessment of the predictions for September's average extent as the final numbers aren't in. They will be soon, but my focus is actually over on the question of how to go about doing the comparisons. Earlier, I talked about testing ideas, but there, the concern was more one of how to find something that you could meaningfully test. Here, with the September's average extent, we already have a well-defined, meaningful thing to look at.

Our concern now is to decide how to compare the observed September average extent with the climatological extent, and a prediction. While mine wasn't the best guess in the June summary at ARCUS, it was mine, so I know what the author had in mind.

Let's say that the true number will be 5.25 million km^2. My prediction was 4.92. The useless approach is to look at the two figures, see that they're different, and declare that my prediction was worthless. Now it might be, but you don't know that from just the fact that the prediction and the observation were different. Another part of my prediction was to note that the standard deviation to the prediction was 0.47 million km^2. That is a measure of the 'weather' involved in sea ice extents -- the September average extent has that much variation just because weather happens. Consequently, even if I were absolutely correct -- about the mean (most likely value) and the standard deviation, I'd expect my prediction to be 'wrong' most of the time. 'Wrong' in that useless sense that the observation differed by some observable amount from my prediction. The more useful approach is to allow for the fact that the predicted value really represents a distribution of possibilities -- while 4.92 is the most likely value from my prediction, 5.25 is still quite possible.

We also like to have a 'null forecaster' to compare with. The 'null forecaster' is a particularly simple forecaster, one with no brains to speak of, and very little memory. You always want your prediction to do better than the null forecaster. Otherwise, people could do as well or better with far less effort than you're putting in. The first 'null forecaster' we reach to is climatology -- predict that things will be the way they 'usually' are. Lately, for sea ice, we've been seeing figures which are wildly different from any earlier observations, so we have to do more to decide what we mean by 'climatology' for sea ice. I noticed that the 50's, 60's, and 70's up to the start of the satellite era had as much or somewhat more ice than the early part of the satellite era (see Chapman and Walsh's data set at the NSIDC). My 'climatological' value for the purpose of making my prediction was 7.38 million km^2, the average of about the first 15 years of the satellite era. A 30 year average including the last 15 years of the pre-satellite era would be about that or a little higher. Again, that figure is part of a distribution, since even before the recent trend, there were years with more or less (than climatology) ice covers.

It may be a surprise, but we also should consider the natural variability in looking at the observed value for the month. Since we're really looking towards climate, we have in mind that if the weather this summer were warmer, there'd be less September ice. And if it were colder, or different wind patterns, there would have been more ice this September. Again, the spread is the 0.47 (at least that's my estimate for the figure).

I'll make the assumption (because otherwise we don't know what to do) that the ranges form a nice bell curve, also known as 'normal distribution', also known as 'Gaussian distribution'. We can then plot each distribution -- from the observed, the prediction, and what climatology might say. They're in the figure:

This is one that makes a lot of sense immediately from the graphic. The Observed and Prediction curves overlap each other substantially, while the curves for Observed and Climatology are so far from each other that there's only the tiniest overlap (near 6.4). That tiny overlap occurs for an area where the curves are extremely low -- meaning that neither the observation nor the climatology is likely to produce a value near 6.4, and it gets worse if (as happened) what you saw was 5.25.

The comparison of predictions gets harder if the predictions have different standard deviations. I could, for instance, have decided that although the natural variability was .47, I was not confident about my prediction method, so taken twice as large a variability (for my prediction -- the natural variability for the observation and for the climatology is what it is and not subject to change by me). Obviously, that prediction would be worse than the one I made. Or at least it would be given the observed amount. If we'd really seen 4.25 instead of 5.25, I would have been better off with a less narrow prediction -- the curve would be flatter, but lower. I'll leave that more complicated situation for a later note.

For now, though, we can look at the people who said that the sea ice pack had 'recovered' (which would mean 'got back to climatology') and see that they were horribly wrong. Far more so than any of the serious predictions in the sea ice outlook (June report, I confess I haven't read all of the later reports). The 'sea ice has recovered' folks are as wrong as a prediction of 3.1 million km^2 would have been. Lowest June prediction by far was a 3.2, but the authors noted that it was an 'aggressive' prediction -- they'd skewed everything towards making the model come up with a low number. Their 'moderate' prediction was for a little over 4.7. Shift my yellow triangle curve 0.2 to the left and you have what theirs looks like -- still pretty close.

To go back to my prediction, it was far better than the null forecaster (climatology), so not 'worthless'. Or at least not by that measure. If the variability were small, however, then the curves would have narrow spikes. If the variability were 0.047, ten times smaller than it is, the curves would be near zero once you were more than a couple tenths away from the prediction. Then the distribution for my prediction would show almost no overlap with the observation and its distribution. That would be, if not worthless (at least it was closer than climatology), at least hard to consider having done much good.


Jesús R. said...

Why does the observed September average (in blue) have a distribution (i.e. a standard deviation)? Given that the observed average is just one value (5.25), shouldn't it be just a vertical bar?

I think that we have about 30 years of satellite monitoring of sea ice. Why do you use the average of the first 15 years of the satellite era (or previous) to set the climatology? Given the melting trend since the beginning of the monitoring, it seems rather difficult that the ice pack jumps back to its state 30 years ago. Wouldn't it be more logical to take the last 15 years or the average of the whole satellite era as the climatology?


Bayesian Bouffant, FCD said...

"it's tough to make predictions, especially about the future."
- Yogi Berra

William M. Connolley said...

Now it is October. I could dig up the data but I bet you have it.

Robert Grumbine said...

In a different thread a reader gave the NSIDC link -- 5.36 million km^2. Much closer than I thought it would be, but it squeaks under our dividing line of 5.38. You lost this one. From your earlier posts at stoat, though, it looks like this is the only ice bet you lost.

crandles said...

I put both the 5.36 and the 5.4 figures on both of your blogs with a link to the data. I am looking forward to your next sea ice posts where I am expecting you both to claim victory ;)

C W Magee said...

Are random dudes on the internet null forecasters? Here is how they did:

crandles said...

How simple should the null forcaster be?

It seems rather obvious that if you want to claim skill for your forecast then you make the null forecaster particularly stupid.

This example of setting null forecaster to 'what normally happens' and making that the average of first 15 years of satellite era seems to be doing precisely that.

I would go further than Jesus's comment:

Spotting a trend and assuming that it will continue is easy and I suspect most people would include that in 'what normally happens'.

"You always want your prediction to do better than the null forecaster. Otherwise, people could do as well or better with far less effort than you're putting in."

Well of course you always want your prediction to do better that goes without saying. Surely, it may not always be possible to do better, but you want your prediction to be expected to be closer else your prediction is not doing anything useful.

Robert Grumbine said...

I'll be taking up some of this more in a full post. Some brief comments.

Jesus: The reason for a distribution vs. a very narrow curve (it won't be exactly a line because the observations have some uncertainties to them, and that means a width to the curve) is that to start with I'm looking at climate. In such a view, what actually happens in any particular year is a sample from climate -- there could have been more or less than we observed this time, and that would be within climate variation. As a weather forecast, we'd be checking against the much narrower curve.

Deciding what 'climatology' means, when you don't have much data in the first place, is a real problem. We know now, for instance, that there is a declining trend to the ice cover. But it is usual to consider climate a stable number. Given that the second half of the record is markedly not like the first half, we can't make that assumption. What I did was to take a part of the satellite record for which you could say there was no trend, and use that for the 'unchanging' part. It's about 15 years. 1979-1996 is the last span of the satellite record for which you could say the trend (from 1979 to that date) was not statistically significant (at 5% level).

Since there are still people saying that the sea ice has recovered, or will soon, and the recovery is to 1980s levels, while I agree that the given null forecaster is particularly stupid, I'll say that it is not unreasonably stupid. Granted those people are usually not scientists. Still, this is a blog rather than a journal article.

Usual null forecasters are climatology (taken as a constant value, not a trend line), and persistence (this day/month/year will be like the last one). It's interesting to me that climatology is taken to have no trend. At this point, we certainly do expect change in sea ice extent. At least most of 'we' in the field do. Still, that is the norm.

Having spotted a trend, you could also continue it. That is only marginally simpler than what I did do. As far as this sort of consideration goes, the suggestion would be that my approach is a slightly better null forecaster than the straight line. The methods which would use this as a null are the coupled atmosphere-ocean-ice models.