24 November 2010

Verifying forecasts 1

I already discussed my earlier sea ice estimates and how they came out, but a few things have happened since then to occasion a two part look at forecast verification.  As usual, it's prompted by seeing someone do it wrong.

One of the errors, which I have to remedy on my own part, is that you should verify (compare to reality) all your forecasts.  I think that the end of May ice estimates are the most interesting and important, rather than later in the year.  Partly this is because of how I think the sea ice pack behaves.  Partly it is because the practical uses of sea ice information I know of require that kind of lead time.  It takes a long time to get a tanker up to Barrow from Seattle, for instance.

Xingren and I did submit a later estimate, for the August Sea ice outlook.  That estimated 4.60 million km^2 for the September average sea ice cover.  An excellent approximation to the NSIDC's reported minimum (4.60) but not as good compared to the observed average extent of 4.90.  Actually a touch worse than our May (30th, even though not reported by SEARCH until June) estimate of 5.13 from the model.  Both estimates were well within 1 standard deviation of the natural variability (errors of +0.23 and -0.30 for May and August's predictions, respectively, versus about 0.5 for the natural variability).  So, on the whole, pretty reasonable.  Just that we'd have expected better from the later estimate.   But ... there's more to that story ...

The two more interesting parts of the story are that scientists can't always do what they want, the way they want to do it, and that sometimes you're better off to make your predictions from farther ahead.  This is known, or rather its opposite, in El Nino prediction as the 'loss of predictability'.  You can do a better job of predicting El Ninos from spring than from summer (I may have misremembered exact timing, but its about a 4 month difference).

The practicality part is that our May estimate was actually from an analyzed state from December of the preceding year.  We were really predicting 9 months ahead, not the apparent 4.  The analysis system had only gotten that far.  Our August estimate was from end of April conditions, not August.  By next year's sea ice outlook period, we'll have the option of starting with current initial conditions.

I also found it interesting that the two estimates had opposite error -- December's condition leading to too much ice extent, April's leading to too little.  That something for us to follow up.  We're correcting a very large bias in the model, on the other hand, the magnitude or the nature of the bias changes seasonally.  ?  Have to do some experiments to figure that out.

That change is also what points me to thinking that we will want to explore whether we have a 'loss of predictability' situation going on in the Arctic.  At the seasonal maximum extent, we have 13-14 million square km of ice ranging from 10 cm to a few meters.  At seasonal minimum, 9 million square km of that has melted away, so for them, we know they've go exactly 0 cm thick ice.  The remaining 4.6 (this year) we only know that it ranges from 10 cm to a few meters, still.  Maybe starting from the previous ice minimum will be the best way to go -- most accurate initial conditions?  Then we hope that the coupled model is unbiased, so does the right thing over the next year?  Again, some experiments to run.

I'll digress slightly again (you're shocked!):

It is very important to be specific about what you're predicting (seasonal minimum extent?  area?  September average extent? area?) and how it's to be measured (NSIDC?  JAXA? ...).  Our August prediction of 4.60 is excellent, if you let us choose after the fact to say that we meant minimum extent, rather than monthly average.  On the other hand, our model prediction from May is excellent if you let us say that it is September's monthly average, rather than minimum day (which we did say then) as measured by JAXA (which we did not).  JAXA's September average was 5.10 million km^2, vs. our May model estimate  of 5.13.  http://www.ijis.iarc.uaf.edu/en/home/seaice_extent.htm

For both monthly average (5.10 vs. 4.90) and day's minimum (4.81 vs. 4.60), JAXA is about 0.2 larger than NSIDC.  That's one reason you have to specify your verification criterion.  As to why they differ, take a look at the area around, say, 75 N, 90 W -- the Canadian Archipelago.  There are a ton of islands up there, which means a ton of coastline.  That means it's hard to decide which grid cells to call 'land' and which are 'ocean'.  I tried the experiment myself before, and you can change the area of the Arctic ocean by about 1 million square km depending on how you decide what to call land.  Further, the Canadian Archipelago is generally a pretty icy place.  If your definition of 'land' puts more water there, you'll also report more ice.  Easily 0.2 million km^2.  This is another part of why I'm particular about specifying which verification data you're using.  This is an issue for everybody who creates a grid of data.  And, unfortunately, I've never seen any one best solution either for the land masking or for transferring data from original satellite observation onto that grid.  (If you know the one best way, let me know.  Minus a bunch of points to you if I've already tried it :-(, but plus a bunch if it's new and I can sell my colleagues on using it.)  Of course then you also should be asking just how much area of water our model has in the arctic ocean.  ... maybe we did better, or worse, then we think?

Once you get serious about verifying predictions, there's a good amount of work involved.  It's only if the predictions are pretty poor that it's easy to assess them.


jacob l said...

any pointers to what you have done in griding???
I'm pretty sure your smart enough to try just not counting the questionable area's, any other dead horses I should know about see as I'm starting at around -1,000,000 points in the hole
thanks jacob l

Robert Grumbine said...

I first encountered this when I was trying to make a land mask for my grid (at that time 25.4 km on a polar stereographic grid) from someone else's higher resolution land mask (1/16th degree, on a latitude-longitude grid, call it 7 km). The simple thing to do is just count up how many cells from the high resolution grid fell into my lower resolution boxes, and what fraction were land. Clearly (turns out this isn't so clear if you consider how the satellite sensors work) if all of the points were water, you call it water on my grid. If they're all land, call it land. But how about if it's some of each? That's the million km^2 that you affect by your decision(s).

It also turned out that the higher resolution mask was biased in favor of calling cells 'land' (for reasons that were sensible to the creators of the mask), so that if any portion of the about 50 km^2 cell were land, even just 1 km^2, then the cell was called land.

I do things differently now, working directly with bounding curves for the land, lakes, and islands. But there is still a fundamental, unavoidable, problem of what to do with cells that are partly land, partly water, or that are very close to land (suppose one edge of the grid cell runs right next to land the whole way, but never has land inside -- but your ocean satellite sensor sees everything within 12 km of its observation point?)

Steve L said...

As sea level rises, presumably the amount of land area in the Canadian Archipelago will decrease. Whether or not this is enough to affect long-range forecasts for sea ice (50 years?) would surely depend on the relief of those islands, which I know nothing about.