10 September 2009

Climate and Computer Science

I'll pick up John Mashey's comment from the 'relevance' thread, as it illustrates in another way some of what I mean regarding relevance, and about who might know what. He wrote:

As a group, computer scientists are properly placed in the last tier.

Once upon a time, computer scientists often had early backgrounds in natural sciences, before shifting to CMPSC, especially when there were few undergraduate CMPSC degree programs.

This is less true these days, and people so inclined can get through CMPSC degrees with less physics, math, and statistics than one would expect.

Many computer scientists would fit B3 background, K2-K3 level of knowledge on that chart I linked earlier.

On that scale, I only rate myself a K4, which corresponds roughly to Robert's Tier 5. Many CMPSC PhDs would rate no higher than K2 (or even K1, I'm afraid, on climate science).


Of course John is one who has been spending serious effort at learning the science, so although our shortcut puts him on a low tier in this area (he's high for computer science!), the earned knowledge is higher. Best, of course, is to work from the actual knowledge of the individual. On the other hand, presented a list of 60 speakers at a meeting, and seeing few from fields in the upper levels (applicable to the topic at hand), it's not a bad bet that the meeting isn't really about the science (or whatever expertise is involved).

If we're talking specifically about climate modellers, we're talking about people who use computers a lot, and make the computers run for very long periods. So, does that mean that all climate modellers are experts about computers the way that computer scientists are? Absolutely not. Again, different matters. Some climate modellers, particularly those from the early days, are quite knowledgeable about gruesome details of computer science. But, as with computer scientists and climate models, that's not the way to bet.

I'll link again to John's K-scale. A computer scientist spends most time learning about computer science. At low levels, this means things like learning programming languages, how to write simple algorithms, and the like. Move up, and a computer scientist will be learning how to write the programs that turn a program in to something the computer can actually work with (compilers), how to write the system that keeps the computer doing all the sorts of processing you want it to (operating systems), interesting (to computer scientists, at least :-) things about data structures, data bases, syntactic analysis (how to invent programming languages, among other things), abstract algorithms, and ... well probably quite a few more things. It's a long time since I was an undergraduate rooming with the teaching assistant for the operating systems class. Things have changed, I'm sure.

Anyhow, on that scale of computer science knowledge, I probably sit in the K2-K3 level. I use computers a lot. And, on the scale of things in my field, am pretty good with the computer science end of things. But, considered as matters of computer science, things like numerical weather prediction models, ice sheet models, ocean models, climate models, etc., are just not that involved. The inputs take predictable paths through the program (clouds don't get to change their mind about how they behave, unlike what happens when you're making the computer work hard by making it do multiple different taxing operations at the same time and do what you like to the programs as they run). Our programs are very demanding in terms of it takes a lot of processing to get through to the answer. But in the computer science sense, it's fairly simple stuff -- beat on nail with hammer a billion times; here's your hammer and there's the nail, go to it.

The climate science, figuring out how to design the hammer, what exactly the nail looks like, and whether it's a billion times or a trillion you have to whack on it -- that part is quite complex. So, same as you can do well in my fields with only K2-K3 levels of knowledge of computer science, computer scientists can do well in theirs with only K2-K3 knowledge of climate science (or mechanical engineering, or Thai, or Shakespeare, ...).

Again, what the most relevant expertise is depends on what question you're trying to answer or problem you're trying to solve. If you want to write a climate model, you should study a lot of climate science, and a bit of computer science. To write the whole modern model yourself, you'll want to study meteorology, oceanography, glaciology, thermodynamics, radiative transfer, fluid dynamics, turbulence, cloud physics, and at least a bit (these days) of hydrology, limnology, and a good slug of mathematics. On the computer science side, you need to learn how to write in a programming language. That's it. It would be nice to know more, as for all things. But the only thing required, from a computer science standpoint, is a programming language. No need for syntactic analysis, operating system design, or the rest of the list I gave above. Not for climate model building, that is. If you want to solve a different problem, they can be vital. (I include numerical analysis in mathematics -- the field predated the existence of electronic computers. Arguably so did computer science. But the modern field, as with modern climatology, is different than 100 years ago.)

14 comments:

gmcrews said...

Once again, we are treading very close to argument from authority -- a logical fallacy.

You are not the first to argue:

1. Scientists know all there really is to know about engineering.
2. Engineers know all there really is to know about computer programming.
3. Therefore, Scientists really know all there is to know about computer programming.

However, don't forget (tongue-firmly-in-cheek):

0. Teenagers know all there really is to know about everything.

I don't see the purpose of the post, especially since it can easily be taken the wrong way. According to Bureau of Labor Statistics there are nearly half a million computer programmers in the United States, with about 20% holding a graduate degree. How many would feel this post somehow insults them?

Now (only slightly tongue-in-cheek) I would readily agree that computer programming is not "rocket science". It's much harder! Software projects have a terrible failure rate. That's fundamentally because the only thing that limits the complexity of software is our inability to make it more complex. Software programs represent the most complex objects ever created by mankind. Example: the reason the climate models need bigger and bigger supercomputers is not fundamentally to simply reduce the size of the grids, it's to add more complex numerical stuff to the models!

You point out that climate models currently require: meteorology, oceanography, glaciology, thermodynamics, radiative transfer, fluid dynamics, turbulence, cloud physics, hydrology, limnology, mathematics, and computer science. Nobody knows all this. There is no person at the top of the tiers. Can I then argue ad hominem that this means nobody is "relevant"? Does this mean that someone in a field not listed above therefore is or is not relevant?

Again, why tread here? It's ultimately based on a logical fallacy anyway.

Penguindreams said...

You really haven't read what I've written. Not least, last post and this, I've said explicitly that nobody knows everything. That included that scientists in fact do not know everything about engineering, nor do engineers know everything about computer science.

The fallacy in 'argument from authority' comes from saying that the authority is infallible. I have never said that, nor been close. What I've been saying is that the people who have studied a topic at the highest levels, for the longest, are more likely to be right about that topic than people who have not engaged in such study.

More likely right is not infallible. It is, however, the way to bet. There is such a thing as meaningful authority. If there weren't, you'd pick names at random from the phone book the next time you were sick -- rather than to go look for a doctor.

As to the prospect for 'insult' you raise ... I'd really hope that the number is zero. I'd hope that by the time someone finished a college degree, they had learned that they are not all-knowing.

I'm also a bit optimistic, and think that people who achieve expertise in a field (vs. merely marking time in sufficient classes to be handed a degree) recognize how much work it was for them to achieve that expertise, and figure that people with expertise in some other field had worked comparably hard and long.

Now, if you're telling me that computer scientists don't learn that, I can say that this has not been my experience with computer scientists. As a member myself of the ACM, I also don't see it in their magazine. Nor was that John Mashey's reaction.

I will be returning to the matter of expertise. While you complain of the 'fallacy of argument from authority', which isn't actually present, you are promoting the 'democratic fallacy' -- that everybody's opinion is equal in all things. Hence your 500,000 computer people being meaningful with respect to climate science, an area that few have any significant knowledge. It's a popular argument, and serves as the basis for the silly petition.

gmcrews said...

Hi Penquindreams,

Thanks for the thoughtful reply and I really did read what you wrote.

I too would like to think people will not be insulted by your post. But, with only a little effort, I found this post and its comments. Oops. A result exactly opposite of what you were trying to accomplish. But typical, I think, of anyone already a skeptic.

Many, many people tend to have very low Bayesian priors when presented with arguments that seem authoritarian. I suggest taking an approach that avoids having to condition on these low priors.

I think trying to get skeptics to change these biases is hopeless.

Penguindreams said...

I'm enough of a realist to know that reactions like the one you point to will occur. I've posted a comment there. We'll see how long it stays up, and what result there is. Interesting, though, that the author doesn't link to the blog (not even a rel='nofollow') here that he's ranting about.

Certain people, I agree, have massive biases, and changing those biases is a hopeless effort. Hence I don't worry much about them. People with such massive biases also can't be considered skeptics. A real skeptic requires evidence, but will change conclusion when the evidence is presented.

John Mashey said...

0 gmcrews: You suggested that computer science be in the top few tiers, but it isn't.

As a computer scientist/software engineering manager/hardware-software architect with a decent science background, who has worked with scientists over many years, I tried to explain why not.

Bob explained it from the viewpoint of a scientist who uses computers.
Our views are quite compatible, and it is truly absurd to think any of this is knocking computer science.

1) Bob's post is quite reasonable.
We might want to say more about the classifications of science and engineering:

a) Scientists often do engineering in pursuit of science. This shows up both in hardware, when building instruments, or in building software.

b) Engineers often use or do science in the pursuit of building state-of-the-art things. Computer scientists often do performance analysis and modeling that can resemble natural sciences data collection and analysis, albeit with easier replication.

c) Scientists doing longer-term programming (not the quick-and-dirty one-shot analysis code) should know:

- relevant languages and tools (not just language, of course). That's programming, just like using Excel can be programming.

- Something about software engineering methodologies, although researchers don't generally need to learn the techniques for truly huge-scale, really long-term commercial-grade software development. The tradeoffs are very different than for mos science codes.

- A little bit about computer science. They need to know enough about data structures to be able to select them, and not do things like reinventing linked-lists in FORTRAN for the nth time.

They should either know a little about numerical analysis, or at least have an expert available to keep them out of trouble.

Since performance matters, they need to know a little about algorithmic complexity analysis to avoid performance cliffs. They probably ought to know enough computer architecture to understand the effects of memory hierarchies and distributed systems, although one can argue whether that is CMPSC or EE. People who write big codes and ignore caches can get clobbered, in ways that classic algorithm analysis doesn't address.

But for most scientists, the right few CMPSC courses (beyond learning languages and tools) are likely plenty.

2) Computer science is a seriously-mongrel field, and in practice, there's more engineering than science, and sometimes there's art, or maybe "engineering elegance".

But, programming, software engineering, and computer science are *not* equivalent, and anyone who equates them reduces their credibility.

Almost anyone can do programming (using Excel can be programming, software engineering requires other skills, and computer scientists often study other things. Not all computer scientists are experienced software engineers, not even computer scientists who are also superb programmers.

I once met with 3 friends who were/are top computer scientists. They were planning new features for a program they'd written together, and wanted advice. A software engineering group I managed had used that program in ways they'd never imagined. One said that getting it to work with 2 other people was the hardest thing he'd ever done, even though they'd worked together for years and had next-door offices. He couldn't believe how Bell Labs ever got multi-year projects to work with hundreds of programmers. I laughed. Different skill set.

3)

gmcrews quotes the author of "The media won’t let the data slow them from continuing our march toward world-wide socialist governance. You may find that statement extreme, in which case my opinion is – you aren’t paying attention."

In the blogosphere, anyone can say anything. Who cares?

gmcrews said...

John Mashey

Great comment sir. Thanks for correcting me.

You touch on the art of programming and how it is not exactly a science. For technical applications, there is a rather unique reason why and it's fairly subtle. Let me make a brief attempt.

In Alan Cooper's book The Inmates Are Running the Asylum, the following questions are asked:

What do you get when you cross a computer with an airplane?
What do you get when you cross a computer with a camera?
What do you get when you cross a computer with an alarm clock?
What do you get when you cross a computer with a car?
What do you get when you cross a computer with a bank?

The answer to all these questions is -- a computer!

Cooper goes on to give examples of why he thinks this is the right answer. Computers change the fundamental nature of machines. We now live in a world where our machines can behave and, importantly, fail in wholly new modes. There are now more car recalls due to computer glitches than due to mechanical flaws. NASA can land probes on Mars without a glitch, but almost lost the missions because they couldn't write a bug free file system for them. (Example of programming being harder than rocket science!) The list keeps growing.

With this in mind: What do you get when you cross our best scientific theories of the climate with a computer? That's right -- a computer!

Adding the computer creates brand new possibilities for a whole host of "emergent behaviors" (not all of them necessarily seen in nature) and entirely new verification and validation issues.

Combine this with rapidly evolving software technology and the result is an art. It's why I think programmers (thanks John for correcting me) should be on a higher tier. (If insisting on having tiers.)

John Mashey said...

Back to relevance. I'm still buried with APS, but here is some promised material, although missing longer text that should be written. It is wise to print the charts.

MAP BCK, K-Scale+.

B Categories. backgrounds in more detail. This is *general* case. For climate science, all but your top tier might map into subcategories in B1-B3.

C Categories not yet

KMBW.a MAP is a complex response to complaints that the K-scale in BCK didn't have "negative knowledge". I tried several ideas, none of which worked well.

Wrong memes can be categorized by the level of knowledge needed to easily detect their wrongness 50% of the time. So, for example "Mars is warming, so not CO2" might well be accepted by someone @ K0, but it means something very different than when Lindzen (~K9) says it, as he has.
(The first is lack of knowledge, the second must be something else).

KMBW.b Flow categorizes the different parts of the KMBW.a map, and shows the flows.

The two above introduce the idea that the further away from the diagonal line one gets to left-and-up, the more intense must be the extra-science set of reasons for doing this, that's the "bias" idea. (I'm just working on this now). The reasons, and their applicability are described in the following sheets. Of course, someone can have little knowledge and a long list of anti-science reasons as well, but it takes a while to assess the latter.

M-Scale: Catalog of memes by difficulty, specifically for climate. Not yet.

OBR MAP

Shows Organizations, People's Backgrounds, and Plausible Reasons for Anti-Science, i.e., condenses B Categories and the next two into one Map.

O CATEGORIES
Are categories and subcategories of organizations involved in anti-science.
This exists, among other reasons, to avoid another wrong meme, "If you do not accept AGW, you must be funded by Exxon". As best as I can tell (much money-laundering), only a few are directly funded by Exxon. Much more comes from family foundations.

R ATTRIBUTES.
Reasons for anti-science, in more detail. This is the evolution of old post at Deltoid, and informed by discussion at JohnQuiggin, whose taxonomy used higher-level groupings. Think of them as molecules compared to my atoms.

The OBR and R items are fairly general, on purpose, to cover various flavors of anti-science.
R has a few climate-specific comments, but needs a specialization (R.a) that applies directly to climate scinice.

D Scale
Dedication to anti-science, or activity level.
Singer, Michaels, et al are 5s. Someone who signs one petition for a friend, and otherwise has no presence, is 1.

Tedsaid said...

Well, I read your post, both here and at the Air Vent. How can I say this nicely? ... well, it's a pretty dumb pyramid.

Do I take my car to a mechanic? Yes. Do I call a plumber about my broken pipes? Yes. The problem is: many, many "climate scientists" do not.

They often do their own statistical analysis, then argue with statistics experts (see: M. Mann, hockey stick). They point to increasing damage due to hurricanes, and forget to consult the economists who would tell them we're building more stuff on the coast. They say the polar bears are dying, and forget to ask the zoologist, who will tell you polar bears are doing better where it's a bit warmer.

This sort of arrogance permeates the global warming "science," which is the exact opposite of the situation that your pyramid is attempting to explain. They don't go to the experts when they should ... and that, in a nutshell, is the whole problem.

Tedsaid said...

Well, I posted a succinct, compelling, and slightly insulting argument against your pyramid scheme. And you allowed it! Congratulations - you now have more integrity than RealClimate.org. They would never leave a well-formulated criticism that they disagree with on their site. (Try it yourself and you'll see.) Cheers.

jg said...

Tedsaid: Was your first comment genuine or only a test? Treating it as genuine I have to dispute a few things: 1) Regarding hurricanes, the effort is to understand whether they will become more frequent and/or more intense based on reconstructing tropical sea surface temperatures, occurrances of El Nino-La Nina events, and storm tracks and intensities. So far, the only direct evidence of track and location that I know of is overwash of debris in low spots near the shoreline reachable only by hurricanes. However, all of these criteria seem unaffected by increases in human coastal development.
Regarding polar bears: Even if a polar bear likes it a little warmer, the concern is not whether the creature breaks a sweat, but that a small global increase in temperature will be amplified in the arctic and remove its habitat. I'll ask a biologist how many top predators thrive when their habitat is remade.
And the Mann hockey stick, wasn't this reviewed by the NAS and passed?

thanks,
jg

Tedsaid said...

Hi, JG. Thanks for your comments. No, my first post was genuine (if a bit rude ... sorry about that.) My second post was merely an observation, based on my and others' experiences with Mann's RealClimate site.

Regarding hurricanes: I was responding to some published reports that hurricanes have been increasing in number and intensity over the last 50-100 years, based on economic damage. Economists have proven that this is merely due to increased development in hurricane-prone areas. Other reports (such as here) show no change up or down in the number of hurricanes over the last century.

However, you seem to be referring to hurricanes in the paleoclimate record? In which case you can rest assured, the only increase during the 20th century over the last 1500 years is due merely to poor statistical handling.

Regarding polar bears: I didn't mean they prefer to be warmer. I meant ... well, there are 19 sub-populations of polar bears, for a total population of around 25,000. (Up from about 5,000 around 1970.) Once you control for the 300-500 that are hunted/killed each year, most of these populations are stable, a couple are declining, and a couple are increasing. Where they are increasing, temperatures have gone up; the opposite is true where they are declining. (source)

Regarding the hockey stick and the NAS: well, no. The NAS agreed that the statistical methods were flawed, then widened the scope of their inquiry to look at newer reconstructions of proxy data. In the end all they could possibly say about Mann's reconstruction prior to 1600 was that it is "plausible."

Meanwhile, the Wegman report characterized Mann's methods as "somewhat obscure and incomplete," and McIntyre and McKitrick's criticisms as "valid and compelling."

But to my original criticism - that they do not consult with statisticians on matters of statistics - it's actually worse than that. For years Mann, et al, actively tried to prevent anyone from reviewing their data and methods. That controversy is helpfully summarized here.

Just in case, here's a bit more on the medieval warm period.

Hope this helps! Cheers.

John Mashey said...

1) Those unfamiliar with this turf might benefit from reading:

William J. Kauffman II, Larry J. Smarr, Supercomputing and the Transformation of Science, 1993, Scientific American Library. You can get a copy for ~shipping cost.

It's beautifully illustrated, and still relevant. I'd recommend a newer equivalent, but I don't know any, and Larry says he doesn't know of one either.

2) Bob might want to poke at the following, which arises from my time in supercomputing discussing issues with petroleum geophysicists, biomedical folks, mechanical engineers, but especially with climate modelers of NCAR, GFDL, NASA, etc.

In science modeling (with or without computers), there seems an equivalent of Liebig's Law of the Minimum for plant growth, except that the 3 necessary items are:

a) Data
b) Science (theory)
c) Enough computation power to get cost-effective answers soon enough to be useful.

SO, some combinations:
a+b, not c:
Some classes of modeling use grids, whose grid elements need to be small enough in both time and space to see the real behavior, and that can lead to an explosion in compute power required.

Astrophysicist Paul Woodward had a nice example of that on our systems in the early 1990s. Specifically, some real turbulence effects simply do not appear until grid elements are "small enough". Some effects are usefully approximated by averages, some require distributions, some just require huge numbers of smaller grid elements. NASA-AMES folks used to beat up up to get bigger main memories all the time.

b+c, not a:
mechanical engineering usually has enough data, whether of airplanes, cars, chocolate ice cream bars, or Barbie dolls, i.e., one often has a fairly precise *design*data.
On the other hand, I suspect climate modelers would commit vile acts to obtain modern temperature records, but going back 1000, 2000, 10,000, 20,000 years. The petroleum folks of course spend tons of money to obtain good data. The medical folks would be pleased to have cost-effective MRI/CAT scanners with better resolution.

a+c, not b):

It can be useful to crunch a lot of data, find correlations, and use them to see if there's something real going on. But, if there's no physical mechanism that works, this is just an instance of the "data mining fallacy", to which some people seen prone. This often generates discovery of "X-year cycles", which evaporate when the next X-year cycle comes. The purely-statistical approach tends to make people think that climate models work that way, which is amusing.

A common error among people with deep, but narrow modeling experience, is to assume that their issues are the same as others', but they often aren't.
(That's TEC8 from my Reasons list). I discussed some specific cases at RC a while ago.

One may note that it took legions of (mostly) women at desk calculators doing the computing for the Manhattan Project. People used to worry a lot about export control of supercomputers, which in fact made sense for not helping people optimize weapons design. Of course, for quite a while, a desktop PC has been more than enough for basic design.

Among the most fun discussions I've had were of the form:

Q (from me): what are your problems, and which ones will be helped by our next-generation systems with X *FLOPS?
lLong discussion of what was understood, what wasn't, what smaller grid elements would help with, what it wouldn't, etc.

Penguindreams said...

Tedsaid:
Sorry about the delay. I normally do the comment approval from email and was interrupted. When I came back, I assumed that I'd already approved. Thanks for the reminder post.

To your comment: Wegman's report was not an NAS report. He was requested by a politician to make a report, the politician being, shall we say, rather partisan. The scientific community review is in the professional literature, or, as a second best, the National Research Council report. The wikipedia article on the 'hockey stick controversy' is fairly reasonable in presenting more than one view.

A note on vocabulary: Plausible is not a bad thing in science. You're actually doing pretty well to get to 'plausible'.

In any case, it's a decade since the paper came out. Much new work has been done, including by other people with other methods, and still finds that we're currently quite warm, and the rise is exceptionally fast. The science has moved on. (Actually this points to a good topic for an article.)

John: Yes, that does look like a good arena for an article. My much more simple-minded thought from way back is that people who are resource-poor, do theory. Resource-rich tends to further applications. I like the 3 part view better.

computer programming said...

Good Stuff!! I really love!!