Sunday, September 21, 2008

Excess Precision

Excessive precision is one of the first methods mentioned in How to Lie With Statistics. It's one that my wife (a nonscientist) had discovered herself. It's very common, which makes it a handy warning signal when reading suspect sources.

In joke form, it goes like this:
Psychology students were training rats to run mazes. In the final report, they noted "33.3333% of the rats learned to run the maze. 33.3333% of the rats failed to learn. And the third rat escaped."

If you didn't at least wince, here's why you should have. In reporting scientific numbers, one of the things you need to do is represent how good the numbers are. In order to talk about 33.3333% of the rats, you'd have to have a population of a million rats or more. 33.3333% is saying that the figure is not 33.3334% or 33.3332%. You only should be showing as much precision as you have data for. Even though your calculator will happily give you 6-12 digits, you should be representing how accurate your number is. In the case of the rat problem, if 1 more rat had been run, one of those 33% figures would change to 25 or 50. The changes of +17% or -8% are so large that they should not even have reported at the 1% level of precision. What the students should have done was just list the numbers, rather than percentages, of rats all along.

As a reader, a useful test is to look for how large the population is versus how many digits they report in percentages. Every digit in the percentage requires 10 times as large a population. Need 10 for the first digit (again, the psych. students shouldn't have reported percents), 100 for the second, and so on. A related question is 'how much would the percentages change with one more success/failure?' This is what I looked at with running the extra rat.

Related is to consider how precise the numbers involved were at the start. When I looked at that bogus petition, for instance, I reported 0.3 and 0.8%. Now the number of signers was given in 4 or 5 digits. That would permit quite a few more than the 1 I reported. The reason for only 1 is that I was dividing the number of signers by the size of the populations (2,000,000 and 800,000) -- and the population numbers looked like they'd been rounded heavily, down to only 1 digit of precision. When working with numbers of different precisions, the final answer can only have as many digits precision as the worst number in the entire chain.


An example, and maybe the single most commonly repeated one from climate, is this page, which gives (variously, but table 3 is the piece de resistance) the fraction of the greenhouse effect due to water vapor as 95.000% That's a lot of digits!

Let's take a look at the sources he gives, and then think a little about the situation to see whether 5 digits precision is reasonable. Well, the sources he has valid links for (1 of the 9 is broken, and one source doesn't have a link; I'll follow that up at lunch at work in a bit) certainly don't show much precision. Or being scientific, for that matter (news opinion pieces and the like). My favorite is the 21st century science and technology (a LaRouche publication), whose cover articles include "LaRouche on the Pagan Worship of Newton". The figures given are 96-99% (LaRouche mag), 'over 90%', 'about 95%', and the like. Not a single one gives a high precision 95.000%, or a high precision for any other figure. This should have been a red flag to the author, and certainly is to us readers. Whatever can be said about the fraction of greenhouse effect due to water vapor, it obviously can't be said with much precision. Not if you're being honest about it. (We'll come back in a later post to what can be said about water vapor, and it turns out that even the lowest of the figures is too high if you look at the science.)

Now for a bit of thinking on water vapor. The colder the atmosphere is, the less water vapor there can be before it starts to condense. (It's wrong to call it the atmosphere 'holding' the water vapor, but more in another post.) It also turns out to vary quite a lot depending on temperature. In wintertime here (0 C, 32 F being a typical temperature), the pressure of water vapor varies from about, say, 2 to 6 mb. In summer, it's more like 10 to 30. (30 million?! It gets very soggy here, though not as much as Tampa.) On a day that it's 30 mb here, it can be 10 mb a couple hundred km/miles to the west. Water vapor varies strongly through both time and space. As a plausibility test, then, it makes no sense for there to be 5 digits precision to the contribution of something that varies by over a factor of 10 in the course of a year, and even more than that from place to place on the planet.

10 comments:

kcsphil said...

This is one of those blog posts that EVERYONE who fights climate change deniers should read. Now, if you can just tweak it a little to be readable fro the great, un-science-oriented masses, that would be even better. And I'm off at lunch to get the book in question.

Penguindreams said...

Suggestions as to where it started going astray? (Or at least where it went astray.)

Do get the book. Huff's is an absolute classic, inexpensive, and definitely readable by folks regardless of background.

I'll add, though, that the excess precision flag is one that can be raised against quite a few posts in climate, regardless of what the poster is supporting or attacking.

kcsphil said...

well . . . you're a scientist. I'm a scientist. precision is our language. That said, I had to read it twice to get the full meaning. Let me play with it a while - It just seems like it needs to be redacted or revised somehow to be a tad less . . . scientific.

Penguindreams said...

One thing I know it suffers from is that I try to do two different things, one being the excess precision, the other its demonstration in a climate-related source.

By all means, let it simmer some in your mind and see what comes up.

And, other readers, if you have suggestions for improvements, do bring them up. The more so if you're not a scientist yourself. I want to be clear to the vast majority of the world that aren't scientists.

Bayesian Bouffant, FCD said...

Research shows that if you add an extra decimal place, 17.83% more people will find your results believable.

Penguindreams said...

:-)

One of my favorite bits of net.sarcasm is "87.32% of statistics on the Internet are made up on the spot."

Steve Bloom said...

I think the main problem is the last paragraph. It assumes a bit too much on the part of the reader. (FYI I'm a non-scientist but a long-term denizen of the climate science blogosphere, and so have seen all of this before.) E.g., what do the mysterious "mb"s have to do with the amount of water vapor? This confusion was added to when you said "million" instead of millibar. Also, does the less informed segment of your intended audience already know that 32F is equal to 0 C, and for that matter what F and C stand for? A few more words and sentences would fix this sort of thing. Alternatively, could this paragraph have been the other post?

kcsphil said...

I wasn't sure if you wanted this as a comment or not, but here it is:

Excessive precision is one of the first methods mentioned in How to Lie With Statistics. It's one that my wife (a nonscientist) had discovered herself. It's very common, which makes it a handy warning signal when reading suspect sources.

In joke form, it goes like this:
Psychology students were training rats to run mazes. In the final report, they noted "33.3333% of the rats learned to run the maze. 33.3333% of the rats failed to learn. And the third rat escaped."

If you didn't at least wince, here's why you should have. In reporting scientific numbers, one of the things you need to do is represent how good the numbers are. That’s what scientists refer to as “precison.” And it’s also a useful mental trick. You see, the more numbers appear after the decimal place, the more “precise” we assume the number to be. Most people would say (without being able to say why) that a more precise number should be a more “correct number,” meaning they would trust the number more.

In the example above, in order to talk about 33.3333% of the rats, you'd have to have a population of a million rats or more. Even though your calculator will happily give you 6-12 digits, you should be representing how accurate your number is. You only should be showing as much precision as you have data for. So here, where the students have only 3 rats (not the millions implied by their number), they should have just said that 33% of the rats did “X.” Actually, what the students should have done was just list the numbers, rather than percentages, of rats all along because they have such a small sample size or universe to report from.

As a reader, a useful test is to look for how large the population is versus how many digits they report in percentages. Every digit in the percentage requires 10 times as large a population. So in our current example, you need 10 rats for the first digit, 100 for the second, and so on.

I’ve looked at precision before. In my previous analysis of that bogus petition, for instance, I reported 0.3 and 0.8% {NB – It’s probably useful to say 0.3 and 0.8% of what}. I reported the number of signers to a single digit after the decimal, not the 4 0r 5 digits used by the original authors.. The reason I had only 1 digit is that I was dividing the number of signers by the size of the populations (2,000,000 and 800,000) -- and the population numbers looked like they'd been rounded heavily, down to only 1 digit of precision. When working with numbers of different precisions, the final answer can only have as many digits precision as the worst number in the entire chain.

As another example, and maybe the single most commonly repeated one from climate science, is this page, which gives the fraction of the greenhouse effect due to water vapor in the atmosphere as 95.000% (See Table 3 for his piece de resistance). That's a lot of digits, but does the data really support that level of precision?

Let's take a look at the sources he gives, and then think a little about the situation to see whether 5 digits precision is reasonable. First, the sources he has valid links for (1 of the 9 is broken, and one source doesn't have a link; I'll follow that up at lunch at work in a bit) certainly don't show much precision. Second, are his sources actually scientific? Several are news or opinion pieces, and my favorite is the 21st century science and technology source, which is actually a LaRouche publication, (whose cover articles include "LaRouche on the Pagan Worship of Newton"). The figures given are 96-99% (LaRouche mag), 'over 90%', 'about 95%', and the like. Not a single one gives as high a precision as 95.000%. This should be a red flag to us readers. Whatever can be said about the fraction of greenhouse effect that might be due to water vapor, it obviously can't be said with much precision based on these data. Not if you're being honest about it. (We'll come back in a later post to what can be said about water vapor, and it turns out that even the lowest of the figures is too high if you look at the science.)

{NB I took out the Water vapor example because 1) you just said you’d be doing a later post on water vapor, and 2) if your readers haven’t gotten it by now the last paragraph won’t help them get it either.

Bob North said...

Good post PD, I quite enjoyed it. Excess precision is definitely one of the things that annoys me. Working in the environmental field, I have seen excess precision many, many times when the results of various tests/analysis are presented in many consltants or regulators reports.

I think the problem may have started getting bad with the introduction of calculators and spreadsheets that automatically spit out calculations to 8 digits unless you specify the appropriate number of sig. figs. (just a personal theory, nothing more).

Finally, regarding the water vapor example, if you want to make it more understandable to lay readers, maybe put in terms of the variability of relative humidity, both throughout a single day and as weather systems move through an area. just descrbing the wide range of relative humidity (something most people or more or less familiar with) should be enough to get the point across that 3 decimal places is excessive.

thanks,
Bob North

Penguindreams said...

Thank you for the comments folks. There'll be a later 'excess precision' note, and a separate one on the water vapor issues. 'Divide and conquer' works for writing too.

steve: I think you're right, part of me trying to do too many things at once. Digressing: Did the sea ice answer cover the ground for you? If not, put in your followup.

kcsphil: Thank you for the detailed suggestions. Had I been moving faster on writing the additional notes, I think I'd have kept your notes as if it had been email. For future: do feel free to send email directly to me at plutarchspam at aim dot com. (The 'spam' is part of the correct address.)

bob: also points well taken. I think you're right about the calculators. When working by hand or slide rule, the bias was the other way -- avoid extra digits if possible. The significant digits then were a lower bound on the calculation. Today ... nobody seems to mind copying 13 digits for an answer to a problem that starts 'there are about 3 million ...'.