20.12.09

TORTURING THE DATA. Mathematical statisticians and theoretical econometricians have produced all sorts of methods to purge observed variables and estimated relationships therein of biases and errors. (There's something titillating to some practitioners in the expression "regress residuals".) A recent roundup of Climaquiddick that's nailed to Newmark's Door recommends an instructive Iowahawk clinic in statistical inference.

Since the Climategate email affair erupted a few weeks ago, it has generated a lot of chatter in the media and across the internet. In all the talk of "models" and "smoothing" and "science" and "hide the decline" it became apparent to me that very very few of the people chiming in on this have even the slightest idea what they are talking about. This goes for both the defenders and critics of the scientists.

Long story, but I do know a little bit about statistical data modeling -- the principal approach used by the main cast of characters in Climategate -- and have a decent understanding of their basic research paradigm.

I think it was E. E. Leamer who paraphrased Bismarck to the effect that you didn't want to watch sausage or econometric estimates being made. The point of smoothing and some of the more advanced techniques for error correction is to obtain an error structure for the model that approximates that of the classical regression model with independent, identically distributed errors. And within applied economics, there are two approaches to the problem of error correction. As a broad generalization, the North American method is to look at the residuals and correct the errors after the fact. The European method is to change the specification, preferably by including a variable that is generating the troublesome residuals. The climate change research is a melange of both.

Over the last 20 years or so, paleoclimatology saw the emergence of a new paradigm in climate reconstruction that utilized relatively sophisticated statistical modeling and computer simulation. Among others, practitioners of the emerging approach included the now -famous Michael Mann, Keith Briffa, and Philip Jones. For sake of brevity I'll call this group "Mann et al."

The approach of Mann et al. resulted in temperature reconstructions that looked markedly different from the one previously estimated, and first receive widespread notice in a 1998 Mann paper that appeared in Nature. The new reconstruction estimated a relatively flat historical temperature series until the past hundred years, at which point it began rising dramatically, and accelerating around 1990.

On the one hand, the researcher has to be able to understand something called principal component analysis, or at least to be able to invoke a command to a canned algorithm that will do the work for you. On the other hand, one gets into treacherous waters even before that.

Although observed temperature measurements prior to 1850 are unavailable, there are a number of natural phenomena that are potentially related to global temperatures, and can be observed retrospectively over 1000 years through various means. Let's call these "proxy variables" because they are theoretically related to temperature. Some proxies are "low frequency" or "low resolution" meaning they can only be measured in big, multi-year time chunks; for example atmospheric isotopes can be used to infer solar radiation going back more than 1000 years, but only in 5-20 year cycles. Other low frequency proxies include radiocarbon dating of animal or plant populations, and volcano eruptions.

By contrast, some proxy variable are "high frequency" or "high resolution," meaning they can be measured a long time back at an annual level. Width and density of tree rings are an obvious example, as well as the presence of o-18 isotope in annually striated glacial ice cores. In principle this type of proxy variable is more useful in historic temperature reconstruction because they can be measured more precisely, more frequently, and in different places around the planet.

A proxy variable follows the movement of the unobserved variable, in this illustration, temperature, but with an error. And thus the statistical magic. But ...

What you'll find is that contrary to Mann's assertion that the hockey stick is "robust," you'll find that the reconstructions tend to be sensitive to the data selection. M&M found, for example, that temperature reconstructions for the 1400s were higher or lower than today, depending on whether bristlecone pine tree rings were included in the proxies.

What the leaked emails reveal, among other things, is some of that bit of principal component sausage making. But more disturbing, they reveal that the actual data going into the reconstruction model -- the instrumental temperature data and the proxy variables themselves -- were rife for manipulation. In the laughable euphemism of Philip Jones, "value added homogenized data."

Where the confession the interrogators wished to obtain from Nature was one that would ... wait for it ... be funded and published.
Successful candidates will:

1) Demonstrate AGW.
2) Demonstrate the catastrophic consequences of AGW.
3) Explore policy implications stemming from 1 & 2.
That's from an Open Cogitations Climaquiddick roundup that will also reward careful study.

The exposure of confirmation biases in funding and publishing pose a problem for academic research more generally. It is difficult to defend scholarly inquiry as disinterested truth-seeking if grants and contracts involve experimental or statistical design that anticipates a conclusion and if publication treats some anomalies as more worthy of consideration than others.

No comments: