08 November 2010

Sea Ice Predictions vs Reality

Ok, I didn't jump on the end of the ice season.  But, the good thing about doing science is that being right or wrong, or learning from your mistakes (or learning from your right answers, even if that's harder to do) is not a matter of a 'news cycle' or what is currently 'hot' in the blogosphere.

The observed ice extent for September 2010, monthly average, from the National Snow and Ice Data Center was 4.90 million km^2.  One thing about making your predictions and deciding how well you did is that you also have do be specific about what you're going to compare against.  You'll find somewhat different figures if you look at other places. 

If you were dishonest, or just not careful, you might select whichever observation was closest to your  prediction.  The problem with that is that it then becomes easy to claim an accurate prediction -- with little regard for the quality of the prediction itself.  Just select the most favorable observation, or process the data yourself in your own way.  (By changing how you do your land masking, you can change your ice areas or extents by upwards of 1 million square km.  ... he said with no tinge of annoying experience.)

It turns out that my May predictions did pretty well. 
But first, some other predictive types.

One was the poll I put up for readers here (which I also entered my two).  Two said 5-5.25 milion km^2, so were a bit high.  3 had 4.75-5.0, which was the correct bin.  3 more took 4.50 to 4.75, a bit low.  4 each took 4-4.5, and under 4 million, so were fairly to very low.  This is a better showing than last year, when everybody (except William Connolley, who didn't enter the poll but did make a bet with me) was too low, some by quite a lot.

I also mentioned some simple predictors -- climatology and simple trends.  They did very poorly:

  • Climatology 1979-2000: 7.03 million km^2
  • Climatology 1979-2008: 6.67 million km^2
  • Linear Trend 1979-2009: 5.37 million km^2
Anyone saying Arctic ice has recovered is clearly far wrong.  It hasn't even recovered enough to be as much as the declining linear trend would predict.

Now for mine (ours -- I was working with Xingren Wu):
  • Wu and Grumbine modeling: 5.13 million km^2
  • Grumbine and Wu statistical ensemble: 4.78 million km^2
  • Grumbine and Wu best fit statistical: 4.59 million km^2

The best fit statistical one did an excellent job of predicting the September minimum, which was 4.60.  The problem being, it wasn't trying to do that.  This is another reason you have to be specific about just what you're predicting.  As for last year, the best fit statistical was too low.  2 is a rather small sample, but being wrong in the same direction does point to something for us to keep in mind when working on next year's prediction.  Finding some way of getting a predictor that's too high as often, and by as much, as it's too low.

The ensemble statistical predictor did pretty well -- off by 0.12 million square km.  I wouldn't want to mow that large an area!  But compared to the natural variability of about 0.5, it's a pretty good result.  So I think the ensemble approach did us some good at representing reality a little better.

The model (actually an ensemble of model runs) was only off by 0.23 million square km.  Again, pretty good compared to the natural variability.  More about how it got there in a moment.

If you averaged our two predictions, you get 4.95 million square km, and off by 0.05, extremely close.  While the sea ice outlook did that, I won' take credit for that accuracy (same as I wouldn't have taken any blame for errors).  Again, we submitted two separate predictions precisely because we did not consider averaging the two to be meaningful.

With the model, we learned something useful.  Namely, the prediction entered was not exactly what came out of the model.  Our adjustment method is what produced the good prediction.  The problem was that the model, we knew, was biased in favor of having too much ice extent and area, and to making the ice too thick.  A model which is consistently biased can actually be very useful, once you figure out how to correct for its biases.  A well-known weather model in the 1970s and 1980s was useful in this way.  Its rain predictions were always a factor of 2 wrong.  Once you divided (or multiplied, I forget which), the model's prediction by 2, you had very good guidance.

Our hope was that we could figure out a way of using the model to get a better estimate than the model itself gave.  Then, if it worked, that our method would shed light on how we could improve the model itself.  There's a fair chance that we have exactly that.  Our method was to use the area of ice that was thicker than 60 cm (or so), rather than ice thicker than zero (using the model straight).  That such a correction method worked tells us that we might be much better off to restart the model with all ice, at least in the Arctic, being 60 cm thinner.  If everything else is correct in the model system, this restart might cure all problems in the ice model.  (We're probably not going to be that lucky, but it's a direction of hope!)  Then we'd have accurate predictions without a need for bias correction.

It is these sorts of things -- finding good ideas on how to restart the model, looking too see how much variability is natural, see what kinds of statistical methods are useful, ... -- that I find useful in the outlook process.  Something to help focus our thinking.  We could do such things anyhow, but it's sometimes also an interesting plus to work on the same problem as other people.  If nothing else, you have something to talk about in the hallway.


stevendm said...

Given that it is a relatively new phenomena, how did you factor in the Arctic Dipole Anomaly (ADA)? The time period of this phenomenon extended throughout the summer months of 2007 and has been credited for contributing to the record minimum ice extent for that year. The ADA lasted part of the summer of 2009 and 2010, but not throughout the summers of these years. At this time, understanding of the occurrence and duration of the ADA is in its infancy.

Robert Grumbine said...

Hi Sinimod:
sometimes life is easy.

the statistical method is too simple (stupid) to know about the ADA. It just figures that certain aspects of the climate system are producing certain tendencies, and, if needed, new things will come up to keep the system going in that direction.

the coupled model either is correctly(-ish) producing the ADA, in which case it will give relatively good predictions, or it fails to do so and gives worse predictions. The thing is, ADA (and many other patterns we've named) is a name for us people for something we (think) we've seen in the data. It lets us look for the same pattern in the models, and maybe do something more directly by way of predictions (if tgere's an ADA index, maybe its value correlates with sea ice state some months later). I don't know of anyone using it in this way, but haven't looked. Will add that to my list :-)

Anonymous said...

eg, as I understand it, the ADA contributes to the "natural variability of about 0.5" that the OP discussed. A model which could predict ahead of time the ADA conditions for the summer season would presumably have a much smaller unexplained natural variability component... (alternatively, an interesting thing to do might be to do first a naive hindcast, then a hindcast with observed ADA, and see if the inclusion of ADA brings the hindcast closer to reality).


Will Crump said...

There is nothing particularly wrong with your model. The problem is thinking that it is possible to do more than pick a particular range for Septmber average ice extent so far in advance. In looking at the graph, it appears that the chance the ice will have a higher extent than the previous year is about 50%. What has occurred is that the decline years show a greater decline on average than the increase years. The complexity of the variables that affect the ice and the large annual variation associated with each of the variables makes an annual prediction as difficult as weather prediction. Rather than try to pick next year's September average extent, try predicting what the five or ten year average for September arctic ice extent will be for 2015 or 2020, respectively and see how this predeiction stands up. It will take longer to validate, but by using a multi-year average, you can eliminate some of the inter-annual variability.

Rattus Norvegicus said...

For an interesting statistical prediction (made in July) see Tamino's wrap up here. He didn't do too badly.

EliRabett said...

First a nit. By listing your predictions to a hundreth of a million sq km, you imply a precision of 1 part in ~500. It would be better, esp for those who teach lab classes to use a tenth e.g. 5.1 rather than 5.12 and even better to state your uncertainty (something the ensemble and the statistical models should be able to handle easily).

Second, and this is something perhaps that your models can do, could you give us a series of maps predicting the ice cover? Eli is a greedy bunny

Robert Grumbine said...

rattus: thanks, I'd have linked to tamino myself if my wrist had been better. I particularly like his title: "I got lucky". Dead right, on both counts.

in terms of the observations, 0.01 million km^2 is not unreasonable precision for a monthly average. That's 10,000 km^2, while each pixel on the satellite analysis (for each day) is ca. 625 km^2. The precision implies that on monthly average, the analysis is not wrong by much more than 16 pixels. If anything, that's a bit conservative.

The pixels which could be in error are those along the edge of the ice pack. Ballpark, that's a line ca. 4000 km long (Canada to Eastern Siberia, Greenland to Western Siberia), or 160 pixels. If we take a worst-case that the satellite is no better than coin-toss accuracy the daily standard error would be sqrt(160*.5*.5), ca. 6.5 pixels. Average 30 of those figures together, and the observational standard error is ca. 1.2 pixels. So it wouldn't be outrageous to be reporting monthly average figures to 0.001 million km^2. Or at least that's my theoretician's take. How does it square with your lab guy's view?

I thought I'd reported the standard deviations for the two different predictions (pretty sure the exact numbers are in our sea ice outlook report). The statistical ensemble had about 0.45 million km^2, and the model ensemble was about 0.25 million km^2. The model agreed with itself pretty well. The fact that we had to shift the extent figure by about 4 of those standard deviations was one of the signs of how large the systematic error was.

I'll check with xingren about whether he can fix up a graphic of the extent (after correcting in the way we did).

EliRabett said...

It strikes me that 16 pixels is not very conservative. That may be what a model run predicts, but it would be surprising if the ensemble got anywhere near that across the entire range of runs

Robert Grumbine said...

Eli: sorry I wasn't clearer. The pixels in question are the satellite analysis pixels, not model grid cells. It is the accuracy and precision of the analysis that I'm concerned about.

The models, as I gave figures for, have a much higher uncertainty. While the figures are rather precise from ,odels, I agree that you can question reporting 2 digits in the prediction when the uncertainty in the prediction is much higher. You'll be glad to know that the sea ice outlook only reports 1 decimal place in the guesses. It'd probably be more reasonable to go with rounding to the nearest quarter million km^2. At least for some purposes.