03 April 2009

How much detail is there really?

I'm thinking about sea surface temperature (SST) these days, but the approach here is one that can be applied to many situations, even ones outside weather and climate. A common, important, and not always easy, questions is -- just how much detail do you need? The more detail, the more expensive it is to make a good product, whether that's an analysis of sea surface temperature, a climate model, or a surface in a video game. Of course, what I'd like is the sea surface temperature every few meters over the entire globe. If that's more than necessary at some time, I could average it down. But ... it would take an awful lot of storage to save temperatures every few meters (my back yard, my neighbor's, my front yard, ...) over the whole globe.

Let's start by looking at an actual high resolution global product, though not every few meters! The SST analysis at http://polar.ncep.noaa.gov/sst/ gives a value every 1/12th of a degree in latitude and longitude, one about every 9 km (6 miles). It has about 9 million values. Let's also suppose that this is fine enough resolution that everything important is represented.

The worst resolution is to use 1 number for the entire globe, the average for all ocean points. To measure how bad this is, I'm going to compute the root mean square error. (Those who know what this is can skip to the next paragraph.) It is often abbreviated rmse. To find it, we go through every ocean point in the grid and find the difference between the value there and the average. Then we multiply this difference by itself (square it -- this avoids the marksmen statistician story*). Then add up these squares for every ocean point. This is a big and not interesting number. One thing that would be more interesting is the average value of the squared error -- the mean square error. So we divide by the number of points that were involved. This also tells us the error variance. Since we think more in terms of temperature and temperature changes than squares of temperature changes, we take the square root of the mean square error -- get the rmse. This is a figure which represents a typical magnitude of how far off we expect to be. We could be either warmer or colder by this much, but this is the magnitude.

* Two statisticians went to a shooting range and each fired at the target. The first missed by 1 meter to the left (-1 meter). The second missed by 1 meter to the right (+1 meter). They then congratulated each other on their fine marksmanship because on average they had hit the bullseye. Their average error was indeed zero. But their rms error was 1 meter.

When I compute the RMSE for using global mean temperature instead of the full resolution grid, I find 12 C. That's ... enormous. The difference between water at 20 C (68 F) and 32 C (90 F) is pretty large! So, clearly, we can't be satisfied with an RMSE of 12 C. But now we have a method for looking at the resolution we need, and a notion of how bad you can get.

Then I made my program average over smaller boxes than the whole globe, say 90 degrees on a side -- London to Chicago, equator to pole -- and found the RMSE comparing those box averages to the original temperatures in the full resolution grid. No surprise that boxes that large were pretty bad. But ... once I got down to boxes 2 degrees on a side (which is something like 200 km, or 120 miles), the RMSE was down to 0.5 degrees.

This is still definitely not zero, but it isn't bad. When a typical satellite used for the job -- such as the AVHRR instrument on NOAA-18 -- is used to make an observation, it has an RMSE (compared to a buoy's thermometer at about the same location at about the same time) of about 0.5 degrees. In other words, with boxes 2 degrees on a side, the average represents what is happening in sea surface temperature about as well as getting a single observation from satellite. We've also managed to reduce our RMS error by about 95% as compared to using only a single number. On the other hand, even though we've captured 95% of what's going on, we only need to use 16,200 numbers -- instead of the 9,331,200 we started with. 95% of the information of the full grid, with only 0.2% as much data.

We've caught 90% of what is happening (reduced the rmse by 90%) when the boxes are 6 degrees on a side (600 km, 360 miles). And it's 99% once we're down to boxes only 0.5 degrees (50 km, 30 miles) on a side (which means only about 3% as many data points are needed to represent the full data set to 99% accuracy).

Now, let's translate this back to some situations we might care about. In trying to construct climatologies of sea surface temperature, we run in to the problem that as we go back in time, there are fewer and fewer data points. On the other hand, if we have 1 observation in each box 6 degrees on a side, we've managed to capture 90% of what is happening in the sea surface temperature. In other words, a much sparser data set than we might imagine could indeed represent an awful lot of what is happening in the ocean. A global grid at 6 degrees resolution has only 1800 points, so we need only 1800 observations to fill it in our simple-minded way.

At 2 degrees resolution, we've captured 95% of what happens in sea surface temperature (at least to this quick little glance -- I only looked at 1 day, as analyzed by 1 center, etc.) So, if we had a good global ocean model at 2 degree resolution, we'd actually be pretty far along in being able to predict sea surface temperatures (model climate, etc.) well. In practice, there are processes that happen in smaller areas than the 2 degree box which can change the whole box's average and we, therefore, want finer resolution than 2 degrees. More about that in a different post.

In thinking about observing systems, if we only 'need' 1 observation every 200 km or so, and we have satellites that can take an observation every 4 km (like the one above) we're all done, right? Unfortunately, no. The problem is, that satellite looks for clouds. If there are clouds -- and cloudy areas can easily stretch for 1000 km -- the satellite can't see the sea surface to tell us what the temperature is down there. So we need other data sources -- ships, buoys, other sorts of satellite (ones that can see through clouds) to fill in even just the 200 km (2 degrees latitude-longitude) boxes each day. Plus we need to observe the detail in the oceans that are involved in those other processes I mentioned. It isn't just for models that they're important -- fishing also cares.

No comments: