01 May 2013

Assessing forecasts

This is actually part of pursuing whether ESMR was screwy, but I decided that to show that nothing was up my sleeve, it was time to talk some about assessing forecasts.  That, and it's something I've been meaning to talk about for a while.  The thing is, forecast assessment is not nearly as simple as we sometimes think.  Having judged many a science fair project that is comparing weather forecasts, I've seen many of the same issues come up there, too.

For precipitation forecasts, people (science fairs included) often think about either 'probability of detection' -- i.e., what fraction of the time that there's rain did the weather forecast call for rain, and 'false alarm rate' -- what fraction of the time did you get no rain even though the forecast called for rain.  Both are potentially meaningful, and both have serious problems if used alone.
The meaningful side is that someone who really doesn't like rain, or needs it not to rain (say farmers after applying certain fertilizers/insecticides/...), wants a very high probability of detection (PoD). Conversely, if you take some expensive actions given a forecast of rain (rearrange your schedule, and you don't like doing that, apply some insecticides that need rain, ...) and it doesn't materialize, you want a very low false alarm rate (FAR).

But it is very easy to cheat either one of those measures of skill.  My PoD score will be perfect if I say, every day, that it will rain.  It's guaranteed on the days that it rains, that this will have been my forecast.  But I'll have called for rain a lot of time that it didn't happen.  Conversely, I can get a perfect FAR score easily -- forecast that it will never rain.  If it ever does rain, then I'll be wrong, but the FAR doesn't care about that error.  PoD and FAR cover each other in this respect -- each is sensitive to the cheating you might do against one of them.

There are actually 4 conditions that can occur:
  1. we say that there will be rain, and there is
  2. we say there will be rain, but there isn't
  3. we say no rain, but there is
  4. we say no rain, and there isn't
In cases 1 and 4, we're correct, and in 2 and 3 we're wrong.  So, another score we can use is % correct.  The benefit to this is that it measures both failure to predict rain and predicting rain that doesn't happen.  The drawback is that it gives equal weight to both (either is a 'miss') -- while most of us are more concerned about one error than the other.  In other words, no one score will tell us everything we want or need to know.

There are many more scores for this kind of situation.  But this is enough complexity for us to start looking at using black and white vision to analyze sea ice extent, that is ice / no ice analysis with the satellite data.

For my first trial, I took the SSMI on F-15, the same 19 GHz, horizontal polarization, channel that ESMR had, for August 1, 2011.  Then I lopped off all land points (no reason to give credit to an algorithm for noticing there's no sea ice on land), all northern hemisphere points (it's the Antarctic that's of most concern for the ESMR period), and all southern hemisphere points north of 48 S (an arbitrary round number -- the point being that we know before starting that there isn't and wasn't ice that far north from Antarctica).  That trimmed things down to about 200,000 observations.

 To start with, I just made a scatter plot of all the observed brightness temperatures against the concentration:
Given about 200,000 points, this isn't enormously clear.  But at least there's a sense that warmer temperatures correspond to more sea ice.  This is reassuring, surprisingly.  The thing is, the ocean is a very poor emitter of microwave energy, so it looks very cold if you use a microwave channel -- as we are.  On the other hand, sea ice is a pretty good emitter of microwave energy, so it looks much 'warmer' in terms of brightness temperatures.  Remember, a brightness temperature tells us how hot an ideal black body would have to be to send us as much energy as we observe -- energy is the observation, not thermometer temperature.  It's also true, however, that warmer ice emits more energy than colder ice.  So part of what is happening in that plot is that high concentrations of very cold ice are showing up in different parts of the diagram than high concentrations of warm ice.

Having reassured myself that even with only 1 channel, and even with the ice temperature effect being ignored (for now), we get a plausible scatter plot, next is to think of some algorithm (method) for going from the observed brightness temperature to an ice / no ice decision.  What I'll do is take the algorithm "If the temperature is above this number, ice is present in the field of view".  And define "ice is present" to mean that the ice analysis has greater than 15% ice concentration.  The 'this number', the critical temperature in my algorithm, I'll simply vary all the way from 80 K (colder than the coldest in the diagram) to 273 K (melting point of ice) -- and see what the scores come out to be.

If I take an extremely cold brightness temperature, I can get perfect Probability of Detection.  But the False Alarm Rate it horrible, as is the % correct.  The interesting zone, for our algorithm evaluation is between about 120 and 160 K.  Probability of Detection is getting worse all the time through that range, but False Alarm Rate is improving (getting smaller).  The % correct rises rapidly to its peak, at a temperature around 140 K and then declines slowly.  Let's magnify the upper parts of the curves for PoD and % correct:
Our PoD is steadily worsening, but, thanks to improving FAR, the % correct is improving up to about 140-150 K, where the scores are about the same, 97% correct, regardless of the critical temperature we choose.

We're far from done, of course.  This is just one day, we've paid no attention to the possibility of using surface temperature estimates (climatology, weather analysis, other satellites, ...) to improve our estimate, the algorithm is extremely simple, and so on.  Also, this is a check for each individual observation.  For obtaining estimates of Antarctic sea ice extents, we want full grids of ice / no ice decisions.  Some of the observations are in the same cell as others, so perhaps the few we're getting wrong would be corrected by others that were in the same cell.  And ... add your own in the comments!

Still, the first exploration here suggests that the 1 channel only method can give a probability of detection better than 95%, and % correct around 97%.  We'll have to think some about whether false alarms are worse than failure to detect, and see how this holds up as we look at more days and methods.


Peter said...

To start with, I just made a scatter plot of all the observed brightness temperatures against the concentration:

Hang on, where did the concentration data come from? Isn't that also from SSM/I? In which case this isn't really about accuracy in calling ice presence/absence, it's about how well a single-channel algorithm agrees with a multi-channel algorithm.

Robert Grumbine said...

True enough Peter. As I mentioned, I was defining 'truth' to be what the full analysis (which actually used both SSMI from F-15 and SSMI-S from F-17) found.

However we view it, though, if the single channel does very badly compared to multi-channel, we have to question using ESMR for ice history. So far, single channel is not looking too horrible.

But it is indeed accuracy in calling ice presence/absence since I'm not checking concentration versus concentration.

Anonymous said...

NSIDC posted a discussion of pre-1979 NIMUS data:

"While the modern satellite data record for sea ice begins in late 1978, some data are available from earlier satellite programs. NSIDC has been involved in a project to map sea ice extent using visible and infrared band data from NASA’s Nimbus 1, 2, and 3 spacecraft, which were launched in 1964, 1966, and 1969. Analysis of the Nimbus data has revealed Antarctic sea ice extents that are significantly larger and smaller than seen in the modern 1979 to 2012 satellite passive microwave record. The September 1964 average ice extent for the Antarctic is 19.7 ± 0.3 million square kilometers (7.6 million ± 0.1 square miles. This is more than 250,000 square kilometers (97,000 square miles) greater than the 19.44 million square kilometers (7.51 million square miles) seen in 2012, the record maximum in the modern data record. However, in August 1966 the maximum sea ice extent fell to 15.9 ± 0.3 million square kilometers (6.1 ± 0.1 million square miles). This is more than 1.5 million square kilometers (579,000 square miles) below the passive microwave record low September of 17.5 million square kilometers (6.76 million square miles) set in 1986.

The early satellite data also reveal that September sea ice extent in the Arctic was broadly similar to the 1979 to 2000 average, at 6.9 million square kilometers (2.7 million square miles) versus the average of 7.04 million square kilometers (2.72 million square miles)."