Fabius Maximus website

Karl Popper explains how to open the deadlocked climate policy debate

Summary: Many factors have frozen the public policy debate, but none more important than the disinterest of both sides in tests that might provide better evidence — and perhaps restart the discussion. Even worse, too little thought has been given to the criteria for validating climate science theories (aka their paradigm) and the models build upon them. This series looks at the answers to these questions given us by generations of philosophers and scientists, which we have ignored. This post shows how Popper’s insights can help us. The clock is running for actions that might break the deadlock. Eventually the weather will give us the answers, perhaps at ruinous cost.

“Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in question, we should have expected an event which was incompatible with the theory — an event which would have refuted the theory.”
— Karl Popper in Conjectures and Refutations: The Growth of Scientific Knowledge (1963).

“I’m considering putting “Popper” on my list of proscribed words.”
— Steve McIntyre’s reaction at Climate Audit to mention that Popper’s work about falsification is the hallmark of science, an example of why the policy debate has gridlocked.

This graph creates a high bar for useful predictions by climate models

From the Department of Energy’s Carbon Dioxide Information Center.

What test of climate models suffices for public policy action?

Climate scientists publish little about about the nature of climate science theories. What exactly is a theory or a paradigm? Must theories be falsifiable, and if so, what does that mean? Scientists have their own protocols for such matters, and so usually leave these questions to philosophers and historians or symposiums over drinks. Yet in times of crisis — when the normal process of science fails to meet our needs — the answers to these questions provide tools that can help.

A related but distinct debate concerns the public policy response to climate change, which uses the findings produced by climate scientists and other experts. Here insights about the dynamics of the scientific process and the basis for proof can guide decision-making by putting evidence and expert opinion in a larger context.

A previous post in this series (links below) described how Thomas Kuhn’s theories explain the current state of climate science. This post looks to the work of Karl Popper (1902-1994) for advice about breaking the gridlocked public policy debate about climate change. At the end of this post is the best-known section of his work about this.

Popper said scientific theories must be falsifiable, and that prediction was the gold standard for their validation. Less well known is his description of what makes a compelling prediction: it should be “risky” — of an outcome contrary to what we would otherwise expect. A radical new theory that predicts that the sun will rise tomorrow is falsifiable by darkness at noon — yet watching the dawn provides little evidence for it. Contrast that with the famous 1919 test of general relativity, whose prediction was contrary to that of the then-standard theory.

How does this apply to climate science?

From NOAA’s interactive Climate At A Glance graphing page.

Predictions of warming

“The globally averaged combined land and ocean surface temperature data as calculated by a linear trend, show a warming of 0.85 [0.65 to 1.06] °C, over the period 1880 to 2012, when multiple independently produced datasets exist. …

“It is extremely likely that more than half of the observed increase in global average surface temperature from 1951 to 2010 was caused by the anthropogenic increase in greenhouse gas concentrations and other anthropogenic forcings together. The best estimate of the human-induced contribution to warming is similar to the observed warming over this period.”

— From the Summary of Policymakers to the IPCC’s Working Group I report of AR5.

Popper’s insight raises the bar for testing the predictions of climate models. The world has warmed since the late 19th century; anthropogenic forces became dominant only after WWII. The naive prediction is that warming will continue. This requires no knowledge of greenhouse gases or theory about anthropogenic global warming.

A risky test requires a prediction that differs from “more of the same”. Forecasts of accelerated warming late in the 21st century qualify as “risky” but provide no evidence today. Hindcasts — matching model projections vs. past observations — provide only weak evidence for the policy debate, as past data was available to the model’s developers.

As usual in climate science, these points have been made — and ignored. For example, “Should we assess climate model predictions in light of severe tests?” by Joel Katzav (Professor of Philosophy, Eindhoven University of Technology) in EOS (of the American Geophysical Union), 11 June 2011. He builds upon Popper’s call for “severe testing” in The Logic of Scientific Discovery (2005) It’s worth reading in full; here is an excerpt.

The scientific community has placed little emphasis on providing assessments of CMP {climate model prediction} quality in light of performance at severe tests. Consider, by way of illustration, the influential approach adopted by Randall et al. in chapter 8 of their contribution to the fourth IPCC report. This chapter explains why there is confidence in climate models thus: “Confidence in models comes from their physical basis, and their skill in representing observed climate and past climate changes”.

…CMP quality is thus supposed to depend on simulation accuracy. However, simulation accuracy is not a measure of test severity. If, for example, a simulation’s agreement with data results from accommodation of the data, the agreement will not be unlikely, and therefore the data will not severely test the suitability of the model that generated the simulation for making any predictions.

…It appears, then, that a severe testing approach to assessing CMP quality would be novel. Should we, however, develop such an approach? Arguably, yes …. First, as we have seen, a severe testing assessment of CMP quality does not count simulation successes that result from the accommodation of data in favor of CMPs. Thus, a severe testing assessment of CMP quality can help to address worries about relying on such successes, worries such as that these successes are not reliable guides to out-of-sample accuracy, and will provide important policy-relevant information as a result.


The public policy debate about climate change has gridlocked in part because many consider the evidence given as insufficient to warrant massive expenditures and regulatory changes. The rebuttal has largely consisted of “trust us” and screaming “denier” at critics. Neither has produced progress; future historians will wonder why anyone expected them to do so.

This series seeks tests that both sides can accept — that might move the policy debate beyond today’s futile bickering.

The insights of Daniel Daves Kuhn and advice by Popper offer a possible solution: test models from the past 4 Assessment Reports using observations from our past but their future. Run them with  observations made after their creation, not scenarios, so they produce predictions not projections — and compare them with observations from after their creation. This will produce better evidence than we have today but still might not provide a “risky” prediction necessary to warrant massive public policy action — diverting resources from other critical challenges (e.g., preparing for return of past extreme weather events, addressing poverty, avoiding destruction of ocean ecosystems).

The criteria to prove current theories about climate change have received too little attention, mostly focusing on increasingly elaborate hindcasts (see this list of papers). Progress will come from better efforts to test the models, new insights from climate scientists, and the passage of time. But by themselves these might prove insufficient to produce timely policy action on the necessary scale. We should add to that list “developing better methods of model validation”.

Karl Popper’s reaction to modern climate science: facepalm.

Update: a “severe test” more severe than Popper’s

In a comment at Climate Etc Willard points to an powerful analysis by Deborah Mayo: “Severe tests, arguing from error, and methodological underdetermination” in Philosophical Studies, 86 (3) 1997. There are levels of severe tests, some more severe than Poppers. Excerpt, red emphasis added.

“Popper’s problem here is that the grounds for the “best tested” badge would also be grounds for giving the badge to countless many other (not yet even though of ) hypotheses, had they been the ones considered for testing. So this alternative hypothesis objection goes through for Popper’s account.

“This is not the case for the severity criterion I have set out. A non-falsified hypothesis H that passes the test failed by each rival hypothesis H 0 that has been considered, has passed a severe test for Popper – but not for me. Why not? Because for H to pass a severe test in my sense it must have passed a test with high power at probing the ways H can err. And the test that alternative hypothesis H 0 failed need not be probative in the least so far as H’s errors go. So long as two different hypotheses can err in different ways, different tests are needed to probe them severely.”

Other posts about the climate policy debate

For More Information

Join the debate about this post at Climate Etc., the website of Judith Curry (Prof of Atmospheric Science at GA Inst of Tech): 370 530 comments and still running strong. Post your thoughts about this here or there.

Please like us on Facebook, follow us on Twitter. For more information see The keys to understanding climate change, My posts about climate change. , and especially these about computer models…

  1. About models, increasingly often the lens through which we see the world.
  2. Will a return of rising temperatures validate the IPCC’s climate models?
  3. We must rely on forecasts by computer models. Are they reliable?
  4. A frontier of climate science: the model-temperature divergence.
  5. Do models accurately predict climate change? — By eminent climate scientist Roger Pielke Sr.
  6. Do models accurately predict climate change?
  7. How accurate are climate scientists’ findings? Look at ocean warming.

Popper’s advice to us

Excerpt from Conjectures and Refutations: The Growth of Scientific Knowledge
by Karl Popper (1963)

After the collapse of the Austrian Empire there had been a revolution in Austria: the air was full of revolutionary slogans and ideas, and new and often wild theories. Among the theories which interested me Einstein’s theory of relativity was no doubt by far the most important. Three others were Marx’s theory of history, Freud’s psychoanalysis, and Alfred Adler’s so-called individual psychology.

There was a lot of popular nonsense talked about these theories, and especially about relativity (as still happens even today), but I was fortunate in those who introduced me to the study of this theory. We all — the small circle of students to which I belonged — were thrilled with the result of Eddington’s eclipse observations which in 1919 brought the first important confirmation of Einstein’s theory of gravitation. It was a great experience for us, and one which had a lasting influence on my intellectual development.

The three other theories I have mentioned were also widely discussed among students at that time. I myself happened to come into personal contact with Alfred Adler, and even to co-operate with him in his social work among the children and young people in the working-class districts of Vienna where he had established social guidance clinics.

It was during the summer of 1919 that I began to feel more and more dissatisfied with these three theories — the Marxist theory of history, psychoanalysis, and individual psychology; and I began to feel dubious about their claims to scientific status. My problem perhaps first took the simple form, “What is wrong with Marxism, psychoanalysis, and individual psychology? Why are they so different from physical theories, from Newton’s theory, and especially from the theory of relativity?”

Available at Amazon.

To make this contrast clear I should explain that few of us at the time would have said that we believed in the truth of Einstein’s theory of gravitation. This shows that it was not my doubting the truth of those other three theories which bothered me, but something else. Yet neither was it that I merely felt mathematical physics to be more exact than the sociological or psychological type of theory. Thus what worried me was neither the problem of truth, at that stage at least, nor the problem of exactness or measurability. It was rather that I felt that these other three theories, though posing as sciences, had in fact more in common with primitive myths than with science; that they resembled astrology rather than astronomy.

I found that those of my friends who were admirers of Marx, Freud, and Adler, were impressed by a number of points common to these theories, and especially by their apparent explanatory power. These theories appeared to be able to explain practically everything that happened within the fields to which they referred. The study of any of them seemed to have the effect of an intellectual conversion or revelation, opening your eyes to a new truth hidden from those not yet initiated. Once your eyes were thus opened you saw confirming instances everywhere: the world was full of verifications of the theory. Whatever happened always confirmed it. Thus its truth appeared manifest; and unbelievers were clearly people who did not want to see the manifest truth; who refused to see it, either because it was against their class interest, or because of their repressions which were still “un-analysed” and crying aloud for treatment.

The most characteristic element in this situation seemed to me the incessant stream of confirmations, of observations which “verified” the theories in question; and this point was constantly emphasized by their adherents.

A Marxist could not open a newspaper without finding on every page confirming evidence for his interpretation of history; not only in the news, but also in its presentation — which revealed the class bias of the paper — and especially of course in what the paper did not say. The Freudian analysts emphasized that their theories were constantly verified by their “clinical observations.”

As for Adler, I was much impressed by a personal experience. Once, in 1919, I reported to him a case which to me did not seem particularly Adlerian, but which he found no difficulty in analysing in terms of his theory of inferiority feelings, although he had not even seen the child. Slightly shocked, I asked him how he could be so sure. “Because of my thousandfold experience,” he replied; whereupon I could not help saying: “And with this newcase, I suppose, your experience has become thousand-and-one-fold.”

What I had in mind was that his previous observations may not have been much sounder than this new one; that each in its turn had been interpreted in the light of “previous experience,” and at the same time counted as additional confirmation. What, I asked myself, did it confirm? No more than that a case could be interpreted in the light of the theory. But this meant very little, I reflected, since every conceivable case could be interpreted in the light of Adler’s theory, or equally of Freud’s.

I may illustrate this by two very different examples of human behaviour: that of a man who pushes a child into the water with the intention of drowning it; and that of a man who sacrifices his life in an attempt to save the child. Each of these two cases can be explained with equal ease in Freudian and in Adlerian terms. According to Freud the first man suffered from repression (say, of some component of his Oedipus complex), while the second man had achieved sublimation. According to Adler the first man suffered from feelings of inferiority (producing perhaps the need to prove to himself that he dared to commit some crime), and so did the second man (whose need was to prove to himself that he dared to rescue the child). I could not think of any human behaviour which could not be interpreted in terms of either theory.

It was precisely this fact — that they always fitted, that they were always confirmed — which in the eyes of their admirers constituted the strongest argument in favour of these theories. It began to dawn on me that this apparent strength was in fact their weakness.

With Einstein’s theory the situation was strikingly different. Take one typical instance — Einstein’s prediction, just then confirmed by the findings of Eddington’s expedition. Einstein’s gravitational theory had led to the result that light must be attracted by heavy bodies (such as the sun), precisely as material bodies were attracted. As a consequence it could be calculated that light from a distant fixed star whose apparent position was close to the sun would reach the earth from such a direction that the star would seem to be slightly shifted away from the sun; or, in other words, that stars close to the sun would look as if they had moved a little away from the sun, and from one another.

This is a thing which cannot normally be observed since such stars are rendered invisible in daytime by the sun’s overwhelming brightness; but during an eclipse it is possible to take photographs of them. If the same constellation is photographed at night one can measure the distances on the two photographs, and check the predicted effect.

Now the impressive thing about this case is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation — in fact with results which everybody before Einstein would have expected.[1] This is quite different from the situation I have previously described, when it turned out that the theories in question were compatible with the most divergent human behaviour, so that it was practically impossible to describe any human behaviour that might not be claimed to be a verification of these theories.

These considerations led me in the winter of 1919-20 to conclusions which I may now reformulate as follows.

(1) It is easy to obtain confirmations, or verifications, for nearly every theory — if we look for confirmations.

(2) Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in question, we should have expected an event which was incompatible with the theory — an event which would have refuted the theory.

(3) Every “good” scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is.

(4) A theory which is not refutable by any conceivable event is nonscientific. Irrefutability is not a virtue of theory (as people often think) but a vice.

(5) Every genuine test of a theory is an attempt to falsify it, or to refute it. Testability is falsifiability; but there are degrees of testability; some theories are more testable, more exposed to refutation, than others; they take, as it were, greater risks.

(6) Confirming evidence should not count except when it is the result of a genuine test of the theory; and this means that it can be presented as a serious but unsuccessful attempt to falsify the theory. (I now speak in such cases of corroborating evidence.)

(7) Some genuinely testable theories, when found to be false, are still upheld by their admirers-for example by introducing ad hoc some auxiliary assumption, or by re-interpreting theory ad hoc in such a way that it escapes refutation. Such a procedure is always possible, but it rescues the theory from refutation only at the price of destroying, or at least lowering, its scientific status. (I later described such a rescuing operation as a conventionalist twist or a conventionalist stratagem.)

One can sum up all this by saying that the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability.