29 November 2010

Verifying forecasts 2

As I said last week, verifying predictions is difficult, and was prompted in to looking again at the matter by someone doing it wrong.  Of course the standard of 'wrongness' involved is mine.  Forecast verification is something of an art as well as mathematics and science.  But some points I think I'll get little argument from Allan Murphy* and his intellectual colleagues and descendants for are:
  • You have to be clear what you're forecasting
    • what variable
    • at what time (or time span)
    • for what place or area
  • You have to be clear how the forecast is going to be evaluated
  • You should evaluate all forecasts
  • Forecast must be public
  • Forecasts must be verifiable
That last might seem a little strange.  I hope not.  Suppose I said next July 20th at 3:34 PM at Washington National Airport the official temperature would be hot.  Very specific about what I'm forecasting and what it will be evaluated against.  But what is 'hot'?  To me, anything over 80 F (27 C).  As such, it's a near certainty that my forecast will be correct.  It's also awfully easy for me, on July 21st, to say, regardless of the temperature, that it was 'hot'.  This is one reason that we prefer numbers in science.  You can, and we do, work with qualitative predictions.  But it takes more work, as you have to find some way of making 'hot' objective, so that we can all agree that such a forecast was correct or not.

In general, if not as universal, we add a couple more items, at least desirable if not mandatory:

  • Forecasts should specify their degree/nature of confidence
  • It's a good idea to compare the quality of your prediction against an null forecast method (not a personal comment, means any method that doesn't know any of the science -- like straight line regression, or persistence; also goes by the principle of 'check how wrong you could be', which I'll illustrate later this week).
  • A trivial matter (except that it comes up in Watts' Nov 23 2010 response to greenman3610) is that all predictions depend on what really happens.  Of course if it's colder, there'll be more ice, and if it's warmer there'll be less.  That's what you're supposed to be predicting!
Now, what prompted this was a video by greenman3610 video and the response from Watts up With That.  greenman observed a very bad forecast coming from WUWT, Watts said it was really rather good.  Figuring out good vs. bad isn't really a scientific question, and those aren't really the words used by either, so be fair to both.  Those are my words, but I think capture fairly the sense of their respective comments.

This is all regarding sea ice.  You can check my original comments from June on my May estimates -- that they were for September average, Arctic, sea ice extent, as measured by NSIDC.  Further, at least in what we submitted to the sea ice outlook, we mentioned what the standard errors in the predictions were.  Don't want it said that I have higher standards for others than I live by myself.

So what was Goddard's prediction?  That turns out to be hard to track down. Tamino and Neven have also looked in to the matter, Neven getting back to February (from his June check).  My selection:

1) On June 6th it is that "Conclusion : Based on current ice thickness, we should expect September extent/area to come in near the top of the JAXA rankings (near 2003 and 2006.) However, unusual weather conditions like those from the summer of 2007 could dramatically change this. There is no guarantee, because weather is very variable."
-- this does tell us what the verification data source is supposed to be, but not whether it is monthly average or daily minimum.  Fairly clearly it is September.  September's minimum and average for 2003, from JAXA, were (6.03, and 6.13) million square km.  September 2006 showed (5.78, 5.91).  The nearest to both would be their average, giving his June 6, 2010 forecast(s) as 5.905, 6.02 million square km for minimum day and monthly average, respectively.

It is not until comments at his personal, separate from WUWT, blog in September that it becomes clear to me that Goddard means the minimum day, not the monthly average.  JAXA's minimum September 2010 day is 4.81 million square km.  So Goddard's June 6 forecast is off by over 1 million km^2.  He gave no sense of variability at this point, but I'll observe my own prior estimate of 0.5 million km^2 for natural variability.  So 2 standard deviations errors.  (Aside: that others were off by as much or more does not affect our evaluation of Goddard's predictions.  n.b., I was not one of those others.)

2) On June 14th, the forecast has changed to "Conclusion : 2010 minimum extent is on track to come in just below 2006. With the cold temperatures the Arctic is experiencing, the likelihood of a big melt is diminishing."
Ok, what does 'just below' mean?  About the same as my 'hot', perhaps.  2006's minimum day at JAXA was 5.78 million km^2, September average of 5.91.  Observed 2010 was (4.81, 5.10).  I'm hard-pressed to call errors of (+0.97, +0.81) million square km 'just below', but the Goddard never defined the term.  (Hence that guide on verifying forecasts!)

3) On June 23rd the forecast becomes:
"I’m forecasting a summer minimum of 5.5 million km², based on JAXA. i.e. higher than 2009, lower than 2006."
The first time he directly names a specific number for the ice (well, one assumes extent, but he doesn't say here whether it's extent or area he means; nor whether it's minimum day or monthly average).  2009's JAXA numbers are (5.25, 5.38) for minimum day and monthly average extent, respectively.  2006 are (5.78, 5.91).  5.5 is between either the minimum day or the monthly average, so this didn't help clarify which he meant.  His September comments did (minimum day), and this comment is also more clearly consistent with minimum day. (0.25 above 2009, 0.28 below 2006, versus being much closer to the 2009 monthly average than 2006 monthly average).  This also gives us a sense of his level of uncertainty -- 0.25 million km^2.  If he were more uncertain than that, he would give a wider range of extents.  Whether that's one or two 'sigma' is also not clear, and, again, points to why we like these things specified.

I'll note that in following this up, I read every one of the WUWT 'sea ice news' posts from #2 to #30, as well as an August midweek update, and all 'verification' posts at Goddard's.  Plus some, but not all, comments in August's posts.  This matters some for what follows.

From July 4th through a comment of his on his own WUWT post August 24th, Goddard continues with 5.5 million square km being his prediction.   Quoting his comment (with date and time so you can find it; I've never figured how to link straight to comments):
stevengoddard says:
August 24, 2010 at 9:46 am
Remember that NSIDC took a mulligan, changing their forecast in July. They started at 5.5 million.
I haven’t taken my mulligan yet ;^)

So at least as late as that the 24th, 5.5 is his prediction and he's taking pride in having not changed his forecast, when talking to WUWT readers.  That's odd, because in the August Sea Ice Outlook, whose due date for submission was mid-month (I did submit to it myself, on time), his prediction was 5.1 million km^2 for September monthly average at NSIDC.  As I mentioned before JAXA runs about 0.2 million above NSIDC, so a 0.4 million square km drop doesn't make sense.  On top of which is monthly average (which, at JAXA, runs about 0.15 million km^2 above minimum day, and more between NSIDC's monthly average and minimum day -- about 0.30 million km^2 this year).

To back that out:  If the prediction for minimum day was 5.5 according to JAXA for minimum day, subtract 0.2 to get NSIDC's minimum day, and then add 0.3 to get the September (NSIDC) average extent.  If his prediction hadn't fundamentally changed, the SEARCH submission should have been 5.6 million km^2.  Since it was 5.1 instead, there's a rather large change.  Surely worthy of a post of its own at either WUWT or his own blog.  In any case, given his August 24th comment, it had to be between then and the 31st.  (At least if it's going to be called an August prediction, which he does.)  That, or he was telling WUWT readers different things on the 24th than he was telling the Sea Ice Outlook.  Or Outlook let him submit late, or ... -- the point being, this shows why it is we want our forecasts to be clearly public.

August 29th Goddard is still referring only to his 'June' forecast of 5.5 million km^2.  No mention of an August prediction.

August 30th, Steven Goddard started blogging regularly at his own blog rather than WUWT.

On August 31st he seems to still like his 'June' forecast (actually, the at least 3rd forecast from June, the one on June 23rd) as he says:
"The video below shows current ice (thin red line) my June forecast (dashed line) and NSIDC’s forecast summer minimum (red horizontal line.)  Who do you think is going to be closest?"
-- and there is no mention of an August prediction.  Note, too, he doesn't mention that NSIDC is predicting a different thing than he is.

The first I see Goddard directly referring to his 'August forecast' is September 7th.  It is mentioned at WUWT the the day before by Watts.  Can you really call something that doesn't deserve a main-post mention until the end of the first week of September a prediction of September?  Ok, maybe I missed the post in which it showed up.  But clearly, given his August 24th and 29th comments, his prediction of 5.1 million square km doesn't surface publicly until after the morning of the 29th.

JAXA's observed ice cover on August 23rd was 5.60 million km^2 (last observation he'd have been able to look at in commenting on the 24th).  24th was 5.55.  August 31st was 5.33 (already 'busting' all of his 'June' forecasts).  The Sea Ice Outlook was released September 1st, so in the last week of August, apparently, after the June forecasts were busted, Goddard made a revised forecast.  (See point of 'how wrong could you be' above; I'll make it its own note later this week.  The answer is, for JAXA, not very if you get to predict the seasonal minimum day from August 23rd.)

It is also with the post of September 7th that I (finally) can be positive that Goddard means to verify minimum day's ice extent as computed by JAXA:
"My June forecast of 5.5 million km² (JAXA) is currently off by 7%."
-- you can't make that statement if you mean monthly average. Who knows what's going to happen the rest of the month?

So at last, I'll return to greenman3610 and Watts' comments on Goddard's prediction(s) made at WUWT.  One part of it being that fundamentally, greenman3610 is not focused on predictions as such.  It it, instead the months of Goddard talking of sea ice being in recovery.  That belief in recovery driving his predictions of ice extent.  But, fundamentally we're looking at at least 6 months of 'sea ice is recovering' posts from Goddard, with numbers or references that compute to numbers from 5.5 million km^2 to over 6 for the extent based on that belief.  Then, in the last 2 days of August, entering a forecast of 5.1 million km^2, which is less than 2009's 5.25.  It's a recovery, there's just less ice?  Don't follow that reasoning.

As to predictions as such, only the June predictions (between 5.78 and 6.03 June 6th, 'just below' 5.78 June 14th, 5.5 June 23rd; in the first two he was referencing years, I filled in the values for minimum day from JAXA for those years) seem to have been made in notes of their own at WUWT.  It's correct to refer to those as his WUWT forecasts lacking any sighting of a post with the 5.1 there before September, and his clear comment on the 24th of August of 5.5 (still) being his prediction with none others (no 'mulligan' as he called it) existing.  Further, after going through all the posts I could find on the topic, it's clear (see above) that he meant to forecast the minimum day's ice cover as computed by JAXA.   That figure, this year, was 4.81 million km^2. So his last (and, turned out to be, best, and the one he consistently referred to as his forecast from June 23rd to at least August 24th) June forecast was off by almost 0.7 million km^2.

On the other hand, Watts points, in November, to only the final prediction from Goddard, which had to have been made in the last two days of August, that was 5.1 million km^2 for September monthly average computed by NSIDC.  (You can tell by noting the horizontal line in the verification figure from SEARCH that Watts shows is at 4.9 million km^2, vs. JAXA's minimum day of 4.81, or NSIDC's minimum day of 4.60, or JAXA's monthly average of 5.10.)  The 5.1 is not so far off from 4.9.  At least SEARCH is more or less clear (not clear enough, I think, that'll be a different email) that this is what they're looking for.  Not clear at all to me that either Watts or Goddard realize the different quantities being forecast or verified with.  Pointing to only one of multiple forecasts violates one of the forecast verification principles I mentioned above. 

Was Goddard's forecast pretty good?  Pretty bad?  Off by 0.2, 0.3, 0.7, 1.1 million km^2?  Who knows.  We can get all those results and more by varying what we take to be his forecast and how what we choose to verify against.  That's why it's such a central point that you say just what you're predicting (guessing, estimating, ...) and how it's to be validated.

Ok, so all good fun in seeing why we want to do forecast verification in the direction that I like rather than waiting until afterwards to figure out what number will be compared against what other number.  There was one other point of contention between Watts and greenman3610 -- the business of Goddard talking of recovery or not.  From August 9th, for example, we see Goddard (he, or Watts, italicized it, so I'll follow suit) saying:
Can we find another year with similar ice distribution as 2010? I can see Russian ice in my Windows. Note in the graph below that 2010 is very similar to 2006. 2006 had the highest minimum (and smallest maximum) in the DMI record. Like 2010, the ice was compressed and thick in 2006. Conclusion : Should we expect a nice recovery this summer due to the thicker ice? You bet ya..

-- The DMI record is even shorter than JAXA, starting in 2005, vs. 2002.  Given that we want 30 years for climate purposes, either is too short for much use, except to cherry pick for 'highest in the record'.  Kind of like being the tallest person in my house.  Sure, I am.  But there aren't many of us here.  The satellite period as a whole only begins in October 1978, so even taking the whole period is pretty short.

Anyhow, there's kerfuffle between greenman3610 and Watts regarded whether there was a 'guarantee' of recovery in Goddard's comment.  You decide (go read the whole note, of course, if you're going to).

I'm hard pressed, to return to science, to see 2010 extent being below 2009 and all years 2006 and before that we have data for, as 'recovery'.  And that takes us back to what constitutes a forecast and how you'd verify it.

* ok, going way back for those who remembered that asterisk.  Allan Murphy was one of the major figures in meteorological forecast verification.  One of the people I discussed verification with fairly often had learned a fair part of what he knew from Murphy.  For the major 'small world' effect, I have a couple of textbooks that Murphy used himself.  (One is 'Strength of Materials', so I guess that he started out in engineering, as I had.)  Anyhow, if you were to read all his papers on verification, you'd be quite knowledgeable indeed.

I try to tell people when I write about their work.  But I could not find any email contact for Goddard at his blog.  I sent a note (available on request) via Watts' contact page 9:45PM Eastern time 24 Nov asking a couple questions and notifying him of this post's scheduled Monday appearance, and to greenman3610 about the same time.  Watts couldn't answer my questions, but did forward (he said evening of the 24th) my note to Goddard.  (No surprise that he couldn't -- they were about Goddard's actions and knowledge.)  Watts, in his response to me mentioned a prediction by Bastardi at WUWT.  It illustrates another bunch of violations of the principles I mention above, so gives another chance to discuss how to do forecast verification.  That'll appear later this week, as well as a consideration of how wrong could you be if you wait until the last week of August to make your prediction of the seasonal minimum.


Hank Roberts said...

> how to link straight to comments

1) right-click on the date and time stamp; it will hilight; "copy link" and paste that in your post, thus:
2) mouse-drag-copy the date and time, then /View/Source and use 'Find' there; past in the date and time; look around that and copy out everything between the start and stop html codes to get the URL and the readable timestamp linked to it (you'll have to use 'view source' on this page to see it) -- you get

August 24, 2010 at 9:46 am

Will Crump said...

Robert Grumbine:

The June 3, 2010 analysis of "When will Arctic ice be gone?" at: http://moregrumbinescience.blogspot.com/2010/06/when-will-arctic-ice-be-gone.html

and the June 1, 2010 analysis of "Sea Ice Estimations" at:


may be flawed as it appears they treated the Arctic ice extent as if it were a single block of ice. This may not be a valid approach as there are distinct seas and regions that make up the Arctic ice. The regions are at different latitudes and have distinct forces that affect their ice compared to the Arctic Basin region.

Cryosphere Today has a regional analysis that divides the Arctic into 14 separate regions. In looking at the September 2010 minimum, there was only a relatively small amount of ice remaining in regions other than the Arctic Basin. These other regions (particularly the regions along the Siberian coast)have experienced a greater percentage reduction of ice in prior years than the Arctic Basin, and will not be able to contribute to future declines in the Arctic ice as there was very little ice left to melt in these regions. These regions may be distorting the trend lines in the statistical analysis.

Because these other regions have already reached a "no ice" level at the September minimum they should not be included in the statistical analysis.

What I am suggesting is that the statistical analysis for when Arctic ice will be gone and the curve for the trend in average September ice extent should only be done with respect to the Arctic Basin region indicated in this link:

This region appears to have been at about 77% of the 1979 to 2008 mean level in September of 2010
(negative anomaly of .75 million and remaining area of 2.5 million so 2.5/3.25 = 77%). Perhaps the University of Illinois Polar research group can provide you with a data base to perform this analysis. I suspect that if statistical analysis only takes the Arctic Basin into consideration, the downward slope of the trend line in the June 1st post will not be as steep and the date when the Arctic will be "ice free" in September will extend considerably beyond the 2035 date indicated in the June 3rd post.

The Arctic Basin will be the last region of Arctic ice that will become ice free in September. As such, the statistical analysis should only be done with respect to this region.

Will Crump said...

Robert Grumbine:

I have found a data source for the Central Arctic Ocean ice area from 1981 to 2009. It is at the ARCUS site link below which has a link to a pdf of a paper by Adrienne Tivy:


The paper by Adrienne Tivy that has historical regional observations in figure 8 of the report for the "Central arctic Ocean". If you apply your statistical analysis to the central arctic data, when does it become ice free?

Peter said...

William: I have a pencil that is 10cm long. I am sharpening it away at the rate of 1cm per year. It is currently 5cm long. When will it all be gone?

Clearly it would be wrong to include the regions nearest to the point of the pencil to do my analyses, as these have distinct forces acting on them compared to the region nearest the other end. To wit: a pencil sharpener.

Thus, I should only take into account the final 1cm of pencil when doing my analysis. Fortunately, it's easy to see that this cm of pencil has thus far shown no change in length whatsoever.

Conclusion: the pencil will never be used up.

Will Crump said...


Arctic ice grows and shrinks. The pencil only gets shorter. The analogy does not work.

Use the remaining portion of your pencil to draw a linear trend line for the September central Arctic ice area using the information in figure 8 in the link below.

When does the trend line for the central Arctic go below 20% of the of the 1979 to 2000 average (Maslowski's definition of ice free).


(Note: this graph does not include the 2010 September data point)

The arctic ice extent will continue to decline, but it will not decline as fast as the current NSIDC September trend line or the volume trend line drawn by Wieslaw Maslowski.


The thinning Arctic ice will increase the variability in the anomaly as noted at this blog:


but the ice will persist in September in the central Arctic Basin longer than NSIDC trend line for the Arctic as a whole would indicate.


Once Hudson Bay freezes up, the Northern Hemisphere area anamoly maintained at Cryosphere today will go below 500,000km2. This should occur by the end of February.

Will Crump said...


You miss the point, it is not one Arctic and one pencil, it is several Arctic regions and several pencils.

Start with 5 10 cm pencils. Each pencil is attached to a separate sharpener that operates at a constant rate. Four have been sharpened at a rate of 1 cm per year for 9 years until only 1 cm remains. The remaining one has been sharpened at a rate of .1 cm per year for 9 years and is 9.1 cm long. If you set up a graph that combines all the pencils, it would show the loss of 4.1 cm per year and you would conclude that the pencils would be gone in just over 3 years.

Yet, if the rate is not changed on any of the sharpeners, the fifth pencil will still be around in 90 years.

Conclusion: A graph that treats the five pencils as if they are a single pencil is a flawed method of predicting future pencil loss.

Grab a pencil and draw a trend line for the central Arctic September extent data in figure 8 of the following link and let me know when the central Arctic will be ice free.


The full report is at:


Robert Grumbine said...


I'm sorry about the lag. Your comments (quite a few at least) made it to me for moderation, but I haven't looked in on the blog for a long time. I've tried to bring through the main points you have.

The central point between you, Peter, and me, is the question of how many independent pieces the Arctic has. You're correct that my analysis, and Peter's, treats the Arctic as one unit. I think you're wrong in saying that we're wrong to do so. That assertion is no help, of course. To go much beyond that point takes us to an interesting, to me at least, part of doing science. Interesting enough that I will make it a full post.

In the mean time, I'll suggest that the Arctic, or central Arctic, or 'Arctic as defined by William Chapman in his figures', etc., is fundamentally not independent of the other sea ice regions. Namely, the heat to melt Arctic (however defined) sea ice comes from elsewhere -- lower latitudes in the atmosphere and ocean. If those other areas, say the Bering Sea, are ice-free, then the ocean and atmosphere has more energy available to melt points farther north, like the Chukchi sea. If the Chukchi is ice-free, then the energy is available to melt central Arctic sea ice.

Will Crump said...

Sorry for the multiple posts, I was not certain that they were getting through and thought the problem was on my end. Thanks for deleting the repetitious posts and for your response.

I would still be interested in seeing your statistical technique applied using only data for the central Arctic Basin. I respect your technique and do not have the skills to do this on my own.

I agree with your statements that the central Arctic should see faster declines due to the changing conditions in the regions that border this region, but I do not see any indication that future declines in the central Arctic will match the rate observed for the Arctic as a whole and I do not understand how the June 3rd analysis compensates for this.

The observed rate of decrease in the central Arctic Basin ice area at the minimum has been much slower than the rate for the Arctic as a whole. While this region dipped to 2.1 million km2 at the minimum in 2007, a substantial decrease from the "near normal" level of 2.65 million km2 for 2006, this result was due more to wind conditions than due to melting.

Part of the proof for this statement is that the central Arctic basin has been able to maintain an area in excess of 2.4 million km2 at the minimum for the last three years - see figure 8 in the link below (which does not include the 2.5 million km2 data point for 2010).


There is no indication that the sea ice area at the maximum for the central Arctic Basin will go below its historical average, regardless of the minimum level. The central Arctic Basin reached its maximum extent of 4.2 million km2 at the beginning of December of 2010.


I predict (perhaps guess would be a more appropriate word) that observations over the next three years will show a reduced rate of decline for the Arctic as a whole as the "easy to melt ice" has already reached the zero extent - zero volume level at the minimum. What remains is ice at latitudes further north, which will not melt as fast as the ice to the south. I believe this explains why ice extent levels have not set a new minimum level after the extreme minimum reached in 2007.

There may be greater volatility in the minimum level for the central Arctic Basin, but the current trend does not appear to support a permanent ice free state for the central Arctic as quickly as it will occur for other regions.

Will Crump said...

Peter and Robert:

The static pencil analysis and the view that the central Arctic Basin is no different from other regions fails to take into account the dynamic processes of Arctic ice movement.

The ice in the central Arctic Basin, particularly near the North Pole is not significantly thicker than the ice at the winter maximum in the regions surrounding the central Arctic. So how is it that ice extent in the central Arctic region does not show a rate of decline similar to its surrounding regions?

The answer is that the Arctic is not a static sheet of ice that melts in place like a pencil being sharpened. The ice that covers the central Arctic Basin at the summer minimum is not the same ice that was at this location at the winter maximum. (This is proved by the tracks the bouy's make in the Arctic.)The ice at the summer minimum in the central Arctic Basin includes ice that is transported into it from the regions surrounding the central arctic Basin. It is this "fresh" supply of ice that maintains the central Arctic Basin ice extent and makes it different from the surrounding regions.

The quote below describes this ice transport feature from the Laptev Sea:

"The Laptev Sea is a major source of arctic sea ice. With an average outflow of 483,000 km2 per year over the period 1979–1995, it contributes more sea ice than the Barents Sea, Kara Sea, East Siberian Sea and Chukchi Sea combined. Over this period, the annual outflow fluctuated between 251,000 km2 in 1984–85 and 732,000 km2 in 1988–89. The sea exports significant amounts of sea ice in all months but July, August and September."

Alexandrov, V.Y.; Martin, T., Kolatschek, J., Eicken, H., Kreyscher, M., Makshtas, A.P. (2000). "Sea ice circulation in the Laptev Sea and ice export to the Arctic Ocean: Results from satellite remote sensing and numerical modeling". Journal of Geophysical Research 105 (C5): 17143–17159. doi:10.1029/2000JC900029. http://www.gi.alaska.edu/~eicken/he_publ/AMKEKM00.pdf.

The statistical analysis for the Arctic as a whole does not take into account the impact of ice transport from surrounding regions into the central Arctic Basin.
Please do the statistical analysis using the 4.25 million km2 area of the Arctic Basin region alone and see what date for an "ice free" Arctic emerges. You can apply a curved trend line in doing the analysis if this provides a better fit than a straight line.

Oale said...

I once thought God was the Ultimate Statistician who makes the specific mistake inherent in all statistics. This means that if we (by chance) do perfect statistics, we make a mistake by not allowing mistakes... I'm not sure of the aacuracy of this thought though.

Oale said...

And that should have gone to: http://initforthegold.blogspot.com/2011/01/science-for-everybody-159.html
talk about accuracy...