Climate scientists can restart the climate policy debate & win: test the models!

Larry Kummer, Editor

9 years ago

Summary; Public policy about climate change has become politicized and gridlocked after 26 years of large-scale advocacy. We cannot even prepare for a repeat of past extreme weather. We can whine and bicker about who to blame. Or we can find ways to restart the debate. Here is the next of a series about the latter path, for anyone interested in walking it. Climate scientists can take an easy and potentially powerful step to build public confidence: re-run the climate models from the first 3 IPCC reports with actual data (from their future): how well did they predict global temperatures?

“Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in question, we should have expected an event which was incompatible with the theory — an event which would have refuted the theory.”
— Karl Popper in Conjectures and Refutations: The Growth of Scientific Knowledge (1963).

The most important graph from the IPCC’s AR5

Figure 1.4 from AR5: Estimated changes in the observed globally and annually averaged surface temperature anomaly relative to 1961–1990 (in °C) since 1950 compared with the range of projections from the previous IPCC assessments. Click to enlarge.

Why the most important graph doesn’t convince the public

Last week I posted What climate scientists did wrong and why the massive climate change campaign has failed. After 26 years, one of the largest longest campaigns to influence public policy has failed to gain the support of Americans, with climate change ranking near the bottom of people’s concerns. It described the obvious reason: they failed to meet the public’s expectations for behavior of scientists warning about a global threat (i.e., a basic public relations mistake).

Let’s discuss what scientists can do to restart the debate. Let’s start with the big step: show that climate models have successfully predicted future global temperatures with reasonable accuracy.

This spaghetti graph — probably the most-cited data from the IPCC’s reports — illustrates one reason for lack of sufficient public support in America. It shows the forecasts of models run in previous IPCC reports vs. actual subsequent temperatures, with the forecasts run under various scenarios of emissions and their baselines updated. First, Edward Tufte probably would laugh at this The Visual Display of Quantitative Information — too much packed into one graph, the equivalent of a Powerpoint slide with 15 bullet points.

But there’s a more important weakness. We want to know how well the models work. That is, how well each forecast if run with a correct scenario (i.e., actual future emissions, since we’re uninterested here in predicting emissions, just temperatures). Let’s prune away all those extra lines on the spagetti graph, leaving forecasts from 1990 to now that match the actual course of emissions.

The big step: prove climate models have made successful predictions

“A genuine expert can always foretell a thing that is 500 years away easier than he can a thing that’s only 500 seconds off.”
— From Mark Twain’s A Connecticut Yankee in King Arthur’s Court.

A massive body of research describes how to validate climate models (see below), most stating that they must use “hindcasts” (predicting the past) because we do not know the temperature of future decades. Few sensible people trust hindcasts, with their ability to be (even inadvertently) tuned to work (that’s why scientists use double-blind testing for drugs where possible).

But now we know the future — the future of models run in past IPCC reports — and can test their predictive ability.

Karl Popper believed that predictions were the gold standard for testing scientific theories. The public also believes this. Countless films and TV shows focus on the moment in which scientists test their theory to see if the result matches their prediction. Climate scientists can run such tests today for global surface temperatures. This could be evidence on a scale greater than anything else they’ve done.

A hurricane in the Weather Research & Forecasting (WRF) Model. From NCAR/UCAR.

Testing the climate models used by the IPCC

“Probably {scientists’} most deeply held values concern predictions: they should be accurate; quantitative predictions are preferable to qualitative ones; whatever the margin of permissible error, it should be consistently satisfied in a given field; and so on.”
— Thomas Kuhn in The Structure of Scientific Revolutions (1962).

The IPCC’s scientists run projections. AR5 describes these as “the simulated response of the climate system to a scenario of future emission or concentration of greenhouse gases and aerosols … distinguished from climate predictions by their dependence on the emission/concentration/radiative forcing scenario used…”. The models don’t predict CO2 emissions, which are an input to the models.

So they should run the models as they were originally run for the IPCC in the First Assessment Report (FAR, 1990), in the Second (SAR, 1995), and the Third (TAR, 2001) — for details see chapter 9 of AR5: Evolution of Climate Models. Run them using actual emissions as inputs and with no changes of the algorithms, baselines, etc. This is a hindcast using data from the “future” (after the model was created), a form of out-of-sample data. It would cost a pittance compared to the annual cost of climate science — and the stakes for the world. How accurately will the models’ output match the actual global average surface temperatures?

This was proposed by Roger Pielke Jr (Prof Environmental Studies, U CO-Boulder) in “Climate predictions and observations“, Nature Geoscience, April 2008.

Of course, the results would not be a simple pass/fail. Such a test would provide the basis for more sophisticated tests. Judith Curry (Prof Atmospheric Science, GA Inst Tech) explains here:

“Comparing the model temperature anomalies with observed temperature anomalies, particularly over relatively short periods, is complicated by the acknowledgement that climate models do not simulate the timing of ENSO and other modes of natural internal variability; further the underlying trends might be different. Hence, it is difficult to make an objective choice for matching up the observations and model simulations. Different strategies have been tried… matching the models and observations in different ways can give different spins on the comparison.”

On the other hand, we now have respectably long histories since publication of the early IPCC reports: 25, 20, and 15 years. These are not short periods, even for climate change. Models that cannot successfully predict over such periods require more trust than many people have when it comes to spending trillions of dollars — or even making drastic revisions to our economic system (as urged by Naomi Klein and Pope Francis).

Conclusion

“Trust can trump Uncertainty.”
— Presentation by Leonard A Smith (Prof of Statistics, LSE), 6 February 2014.

Re-run the models. Post the results. More recent models presumably will do better, but firm knowledge about the performance of the older models will give us useful information for the public policy debate. No matter what the results.

As the Romans might have said when faced with a problem like climate change: “Fiat scientia, ruat caelum.” (Let science be done though the heavens may fall.)

“In an age of spreading pseudoscience and anti-rationalism, it behooves those of us who believe in the good of science and engineering to be above reproach whenever possible.“
— P. J. Roach, Computing in Science and Engineering, Sept-Oct 2004 — Gated.

For More Information

(a) Please like us on Facebook, follow us on Twitter. For more information see The keys to understanding climate change and My posts about climate change. Also see these posts about models…

(b) See the papers about model validation listed in (f) below. But this is especially clear about the situation: “Reconciling warming trends” by Gavin A. Schmidt et al, Nature Geoscience, March 2014 — Ungated copy here.

“CMIP5 model simulations were based on historical estimates of external influences on the climate only to 2000 or 2005, and used scenarios (Representative Concentration Pathways, or RCPs) thereafter. Any recent improvements in these estimates or updates to the present day were not taken into account in these simulations.

“{We} collated up-to-date information on volcanic aerosol concentrations, solar activity and well-mixed greenhouse gases in the 1990s and 2000s. These updates include both newly observed data and also reanalyses of earlier 1990s data on volcanic aerosols based on improved satellite retrievals {and compared} the updated information with the data used in the CMIP5 climate model simulations …”

(c) This proposal is an obvious one; I don’t claim it is original. Roger Pielke Jr. (Prof of Environmental Studies, U CO-Boulder) made a similar proposal, but more complete in “Climate predictions and observations” (Nature Geoscience, April 2008). Also see “Carrick” in this Sept 2013 comment at Climate Audit) There are probably others.

Why has this test not been done? We can only guess.

(d) I learned much, and got several of these quotes, from a 2014 presentations by Leonard A Smith (Prof of Statistics, LSE): the abridged version “The User Made Me Do It” and the full version “Distinguishing Uncertainty, Diversity and Insight“. Also see “Uncertainty in science and its role in climate policy“, Leonard A. Smith and Nicholas Stern, Phil Trans A, 31 October 2011.

(e) Introductions to climate modeling.

These provide an introduction to the subject, and a deeper review of this frontier in climate science.

“A Model World” by Jon Turney in Aeon, 16 December 2013.
“Climate Modeling 101: What are climate models and why are they important?” by the National Academy of Science.
An introduction to climate models by the World Meteorological Society.
“The Physics of Climate Modeling” by Gavin A. Schmidt in Physics Today, January 2007.
Comparing the models used in AR4 (CMIP3) and AR5 (CMIP5).
“Evaluation of Climate Models“, chapter 9 in AR5.
Important: “A practical philosophy of complex climate modelling” by Gavin A. Schmidt and Steven Sherwood, European Journal for Philosophy of Science, May 2015 (ungated copy here).

Judith Curry (Prof Atmospheric Science, GA Inst Tech) reviews the literature about the uses and limitation of climate models…

(f) Selections from the large literature about validation of climate models.

(1) Any discussion of climate science should start with what the IPCC says. See AR5, WGI, Chapter 9: “Evaluation of Climate Models” for a long detailed analysis. The bottom line, from the Executive Summary:

Most simulations of the historical period do not reproduce the observed reduction in global mean surface warming trend over the last 10 to 15 years. There is medium confidence that the trend difference between models and observations during 1998–2012 is to a substantial degree caused by internal variability, with possible contributions from forcing error and some models overestimating the response to increasing greenhouse gas (GHG) forcing. Most, though not all, models overestimate the observed warming trend in the tropical troposphere over the last 30 years, and tend to underestimate the long-term lower stratospheric cooling trend.

(2) Perhaps the best known attempt at model validation is “Global climate changes as forecast by Goddard Institute for Space Studies three-dimensional model” by Hansen et el, Journal of Geophysical Research, 20 August 1988. Its skill is somewhat evaluated in “Skill and uncertainty in climate models” by Julia C. Hargreaves, WIREs: Climate Change, July/Aug 2010 (ungated copy). She reported that “efforts to reproduce the original model runs have not yet been successful”, so she examined results for the scenario that in 1988 Hansen “described as the most realistic”. How realistic she doesn’t say (no comparison of the scenarios vs. actual observations); nor can we know how the forecast would change using observations as inputs.

Two blog posts discuss this forecast (for people who care about such things): “Evaluating Jim Hansen’s 1988 Climate Forecast” (Roger Pielke Jr, May 2006) and “A detailed look at Hansen’s 1988 projections” (Dana Nuccitelli, Skeptical Science, Sept 2010).

(3) Also important is this evaluation of forecast in the IPCC’s First Assessment Report “Assessment of the first consensus prediction on climate change“, David J. Frame and Dáithí A. Stone, Nature Climate Change, April 2013. They evaluated the original projections (i.e., runs using simulations), which did not include the eruption of Mt. Pinatubo, the collapse of the Eastern Bloc economies, or the rapid growth of East Asia’s economies. Nor did they show the difference between the scenarios used and actual observations.

(4) “Recent Climate Observations Compared to Projections” by an all-star group of scientists — Stefan Rahmstorf, Anny Cazenave, John A. Church, James E. Hansen, Ralph F. Keeling, David E. Parker, Richard C. J. Somerville — in Science, 4 May 2007. Ungated copy here. This is often cited as proof of models’ forecasting skill. It makes no such claim. The paper is only one page long. It has one paragraph describing global surface temperature changes and one about sea levels. There is little description or analysis, and no statistical testing. Also note this claim, which evidence in the past few years reveals to be exaggerated at best. Models are tuned to match past data (details here), make extensive use of parametrization.

“Although published in 2001, these model projections are essentially independent from the observed climate data since 1990: Climate models are physics-based models developed over many years that are not ‘tuned’ to reproduce the most recent temperatures …”

(5) “Test of a decadal climate forecast“, Myles R. Allen et al, Nature Geoscience, April 2013 — Gated. A follow-up to “Quantifying the uncertainty in forecasts of anthropogenic climate change” (Allen et al, Nature, October 2000), evaluating one model’s forecasts using data through 1996 over the subsequent 16 years. They re-ran the model, but do not state if they used the original scenario or actual observations after 1996 to general the prediction. The forecast was significantly below consensus, and so quite accurate. Odd that this examination of it provided so little information.

Other articles about validation of models. Most are just the usual hindcasts.

“Potential climate impact of Mount Pinatubo eruption“, James Hansen et al, Geophysical Research Letters, 24 January 1992. Ungated copy. Nice validation of long-standing theory, including the early 1980s “nuclear winter” simulations.
“Irreducible imprecision in atmospheric and oceanic simulations” by James C. McWilliams in PNAS, 22 May 2007.
“How Well Do Coupled Models Simulate Today’s Climate?“, BAMS, March 2008 — Comparing models with the present, but defining “present” as the past (1979-1999).
Similar proposal to mine, but more complete: “Climate predictions and observations“, Roger Pielke Jr., Nature Geoscience, April 2008.
“Should we believe model predictions of future climate change?”, Reto Knutti, Philosophical Transactions A, December 2008.
“Confirmation and Robustness of Climate Models“, Elisabeth A. Lloyd, Philosophy of Science, December 2010. Ungated copy.
Important: “Should we assess climate model predictions in light of severe tests?”, Joel K. Katzav, Eros, 7 June 2011.
More hindcasting: “Skillful predictions of decadal trends in global mean surface temperature“, J. C. Fyfe et al, Geophysical Research Letters, November 2011. Gated; open draft here. Comments by Pielke Sr here.
“The Reproducibility of Observational Estimates of Surface and Atmospheric Temperature Change” by B. D. Santer, T. M. L. Wigley, and K. E. Taylor in Science, 2 December 2011. Discusses proof and replication, but in only binary terms — models showing warming or cooling. Nothing about matching the pattern of temperature change over 10 or 20 year periods (warming and pauses).
“Reliability of multi-model and structurally different single-model ensembles“, Tokuta Yokohata et al, Climate Dynamics, August 2012. Uses the rank histogram approach.
Important: “Assessing climate model projections: State of the art and philosophical reflections“, Joel Katzav, Henk Djikstra, and Jos de Laat, Studies in History and Philosophy of Science Part B, November 2012. Ungated copy.
“The Elusive Basis of Inferential Robustness“, James Justus, Philosophy of Science, December 2012. A creative look at a commonly given reason to trust GCMs.
“Real-time multi-model decadal climate predictions” by Doug M. Smith et al., Climate Dynamics, December 2012 — Gated. Open copy here. Hindcasts and forecasts. “Verification of these forecasts will provide an important opportunity to test the performance of models and our understanding and knowledge of the drivers of climate change.” Yes.
“Initialized near-term regional climate change prediction“, F. J. Doblas-Reyes, Nature Communications, 13 April 2013 — Hindcasts.
“Can we trust climate models?”, J. C. Hargreaves and J. D. Annan, Wiley Interdisciplinary Reviews: Climate Change, July/August 2013.
“Overestimated global warming over the past 20 years”, John C. Fyfe et al, Nature Climate Change, September 2013. Hindcasts.
Important: “Severe testing of climate change hypotheses“, Joel K. Katzav, Studies in History and Philosophy of Modern Physics, November 2013. Ungated copy.
“Can climate models explain the recent stagnation in global warming?“, H. Von Storch et al, 2013 — unpublished. Hindcast of models used in AR4 and AR5 vs. two scenarios.
Important: “Reconciling warming trends” by Gavin A. Schmidt et al, Nature Geoscience, March 2014 — Ungated copy here.
“Recent observed and simulated warming“, John C. Fyfe and Nathan P. Gillett, Nature Climate Change, March 2014 — Gated. “Fyfe et al. showed that global warming over the past 20 years is significantly less than that calculated from 117 simulations of the climate by 37 models participating in Phase 5 of the Coupled Model Intercomparison Project (CMIP5). This might be due to some combination of errors… It is this light that we revisit the findings of Fyfe and colleagues.”
“CMIP5 historical simulations (1850–2012) with GISS ModelE2“, RL Miller, Gavin Schmidt, et al, Journal of Advances Modeling Earth Systems, June 2014.
“Well-estimated global surface warming in climate projections selected for ENSO phase“, James S. Risbey et al, Nature Climate Change, September 2014. Hindcasting of CMIP5. Reported as “Study vindicates climate models accused of ‘missing the pause’“.
“Predictions of Climate Several Years Ahead Using an Improved Decadal Prediction System“, Jeff R. Knight et al, Journal of Climate, October 2014.
“The Robustness of the Climate Modeling Paradigm“, Alexander Bakker, Ph.D. thesis, VU University (2015).
“Comparing the model-simulated global warming signal to observations using empirical estimates of unforced noise“, Patrick T. Brown et al, Scientific Reports, April 2015.
“Uncertainties, Plurality, and Robustness in Climate Research and Modeling: On the Reliability of Climate Prognoses“, Anna Leuschner, Journal for General Philosophy of Science, 21 July 2015. Typical cheerleading; proof by bold assertion.
“Robust comparison of climate models with observations using blended land air and ocean sea surface temperatures“, Kevin Cowtan et al, Geophysical Research Letters, 15 August 2015. Open copy here.
“How well must climate models agree with observations?“, Dirk Notz, Philosophical Transactions A, 13 October 2015.
“Evaluation of forecasts by accuracy and spread in the MiKlip decadal climate prediction system“, Christopher Kadow et al, Meteorologische Zeitschrift, 2017. Hindcasts.
“Assessing temperature pattern projections made in 1989” by Ronald J. Stouffer and Syukuro Manabe in Nature Climate Change, March 2017. They compare the geographical pattern of warming in their 1989 model forecast vs observations. Limitations in their model cause “problems in comparing models to observations and makes the comparisons shown here qualitative in nature. It is one of the reasons why we focus our attention on the geographical distribution of surface temperature change rather than the magnitude of change in this study.”
“Reconciling the signal and noise of atmospheric warming on decadal timescales“, Roger N. Jones and James H. Ricketts, Earth System Dynamics, 8 (1), 16 March 2017.
“Apparent limitations in the ability of CMIP5 climate models to simulate recent multi-decadal change in surface temperature: implications for global temperature projections” in Climate Dynamics, July 2017 “Given that the same models are poorest in representing observed multi-decadal temperature change, confidence in the highest projections is reduced.”
“The epistemological status of general circulation models” by Craig Loehle, Climate Dynamics, March 2018.
“Validation of Climate Models: An Essential Practice” by Richard B. Rood in Computer Simulation Validation: Fundamental Concepts, Methodological Frameworks, and Philosophical Perspectives, Editors Claus Beisbart and Nichole J. Saam (2019). Post-review draft here.