We must rely on forecasts by computer models. Are they reliable?

Summary: Computer models have opened a new era across the many fields of science. Our confidence in their forecasts has opened a new era in scientists’ ability to influence public policy. Now come the questions. How can we determine the accuracy and reliability of these models, knowing when their forecasts deserve our confidence? What can make them more useful?  {1st of 2 posts today.}

“The criterion of the scientific status of a theory is its falsifiability, or refutability, or testability.”
— Karl Popper in Conjectures and Refutations: The Growth of Scientific Knowledge (1963).

“Probably {scientists’} most deeply held values concern predictions: they should be accurate; quantitative predictions are preferable to qualitative ones; whatever the margin of permissible error, it should be consistently satisfied in a given field; and so on.”
— Thomas Kuhn, The Structure of Scientific Revolutions (1962).

Forecasting with models

About predictions

Thomas Kuhn explained that predictions are the gold standard that often decides the winner between competing scientific theories (“paradigms”). The Structure of Scientific Revolutions described failed predictions that undermined the current dominant paradigm (the Michelson–Morley experiment) and successful predictions that helped establish new paradigms (the orbit of Mercury).

With the increasing prominence of science in public policy debates, the public’s beliefs about theories also have effects. Playing to this larger audience, scientists have developed an effective tool: computer models making bold forecasts about the distant future. Many fields have been affected, such as health care, ecology, astronomy, and climate science. With their conclusions amplified by activists, long-term forecasts have become a powerful lever to change pubic opinion.

Unfortunately, models are vulnerable to confirmation bias in their construction and selection (a problem increasingly recognized, for example in the testing of drugs). Equally problematic are issues of measuring their reliability and — more fundamentally — validation (e.g., falsification).

Peer-review has proven quite inadequate to cope with these issues (which lie beyond the concerns about peer-review’s ability to cope with even standard research). A review or audit of a large model often requires over a man-years or more of work by a multidisciplinary team of experts, the kind of audit seldom done even on projects of great public concern.

Two introductions to computer modeling

(1)  The climate sciences have the highest profile and most controversial use of computer models for forecasting. For a look at their use see “Questioning the robustness of the climate modeling paradigm” at Judith Curry’s (Prof Atmospheric Science, GA Inst Tech) website, discussing a paper by Alexander Bakker. His conclusion…

The paradigm that GCMs are the superior tools for climate change assessments and that multi-model ensembles are the best way to explore epistemic uncertainty has lasted for many decades and still dominates global, regional and national climate assessments. Studies based on simpler models than the state-of-the-art GCMs or studies projecting climate response outside the widely accepted range have always received less credence. In later assessments, the confirmation of old results has been perceived as an additional line of evidence, but likely the new studies have been (implicitly) tuned to match earlier results.

Shortcomings, like the huge biases and ignorance of potentially important mechanisms, have been routinely and dutifully reported, but a rosy presentation has generally prevailed. Large biases seriously challenge the internal consistency of the projected change, and consequently they challenge the plausibility of the projected climate change.

Most climate change scientists are well aware of this and a feeling of discomfort is taking hold of them. Expression of the contradictions is often not countered by arguments, but with annoyance, and experienced as non-constructive. “What else?” or “Decision makers do need concrete answers” are often heard phrases. The ’climate modelling paradigm’ is in ’crisis’. It is just a new paradigm we are waiting for.

Professor Curry has written about the problematic aspects of the current generation of global coupled atmosphere-ocean models, and ways to improve them:

(2)  For a useful introduction to the subject — and recommendations — see this article by two ecologists: “Are Some Scientists Overstating Predictions? Or How Good are Crystal Balls?“, Tom Stohlgren and Dan Binkley, EcoPress, 28 October 2014 — Excerpt:

We found a particularly enlightening paper with the enticing title, “The good, the bad, and the ugly of predictive science“. It explained a sort of common knowledge in the field that the foundation of large-scale predictable relationships was full of tradeoffs. The authors remind us that any mathematical or numerical model gains credibility by understanding the trade-offs between:

  1. Improving the fidelity to test data,
  2. Studying the robustness of predictions to uncertainty and lack-of-knowledge, and
  3. Establishing the “prediction looseness,” of the model. Prediction looseness here refers to the range of predictions expected from a model or family of models along the way.

… Given the … large and unquantifiable uncertainties in many long-term predictions, we think all predictions should be:

  1. stated as hypotheses,
  2. accompanied by short-term predictions with acceptance/rejection criteria,
  3. accompanied by simple monitoring to verify and validate projections,
  4. carefully communicated with model caveats and estimates of uncertainties.



We have few — and often no — alternatives to forecasts by computer models. They are an essential tool to guide us through an increasingly complex world. Their successful use requires more thought about ways to validate them. Too often their results appear in the news with little more than exhortations to “trust us”. That’s not enough when dealing with matters of public safety, often requiring vast expenditures.

The good news: there are immediate procedural changes scientists can take today to improve the testability of their results. Human nature being what it is, they’ll not do so without outside pressure.

Appendix: other predictions

For an example of the problem see Edwin O. Wilson’s forecast in The Diversity of Life (1992). Despite the many studies proving it quite false (e.g., Nature 2011, the Committee on Recently Extinct Organisms, the IUCN Red List), it’s still widely cited. This analysis implies that 600 thousand species have gone extinct since 1992.  No estimate shows more than a small fraction of that. Red emphasis added.

There is no way to measure the absolute amount of biological diversity vanishing year by year in rain forests around the world, as opposed to percentage losses, even in groups as well known as the birds. Nevertheless, to give an idea of the dimension of the hemorrhaging, let me provide the most conservative estimate that can be reasonably based on our current knowledge of the extinction process. I will consider only species being lost by reduction in forest area, Even with these cautious parameters, selected in a biased manner to draw a maximally optimistic conclusion, the number of species doomed each year is 27,000.

Here are other examples of blown or exaggerated predictions. Although not all made using computer models, they show the powerful impact of authoritative statements by scientists — magnified by activists. The common elements are that those involved remain unapologetic about their errors, there is little cost to made predictions, and few of those involved show signs of learning from their mistakes (probably because there was no cost).

  1. 13 Worst Predictions Made on Earth Day, 1970“, Jon Gabriel, FreedomWorks, 22 April 2013.
  2. Embarrassing Predictions Haunt the Global-Warming Industry“, Alex Newman, The New American, 12 August 2014.
  3. Other examples of models forecasting unrealistically high rates of extinction.

For More Information

If you liked this post, like us on Facebook and follow us on Twitter. See all posts  about forecasting and about computer models. Also see these articles…

  1. The good, the bad, and the ugly of predictive science“, F. M. Hemez and Y. Ben-Haim, 4th International Conference on Sensitivity Analysis of Model Output (2004).
  2. A tentative taxonomy for predictive models in relation to their falsifiability“, Marco Vicenconti, Philosophical Transactions of the Royal Society A, 3 October 2011.
  3. Can we trust climate models?” J. C. Hargreaves and J. D. Annan, Wiley Interdisciplinary Reviews: Climate Change, July/August 2013.
  4. A model world“, Jon Turney, Aeon, 16 December 2013 — “In economics, climate science and public health, computer models help us decide how to act. But can we trust them?”

25 thoughts on “We must rely on forecasts by computer models. Are they reliable?

  1. This ties in with the rise of public relations firms. These are our new class of professional bamboozlers. That’s right. For not too much money you can now hire skilled and experienced professionals to convince everyone that almost anything is true. In their worst nightmares the founders never imagined this.

    Liked by 1 person

    1. Peter,

      Great observation! The rise of PR agents is an important but little-recogniced factor in US society. There are more PR flacks, they’re better paid than journalists, and since WW1 their methods have grown far more powerful.

      For a good intro see Pew Research’s “The growing pay gap between journalism and public relations“, 11 August 2014. Excerpt:

      The salary gap between public relations specialists and news reporters has widened over the past decade – to almost $20,000 a year, according to 2013 U.S. Bureau of Labor Statistics data analyzed by the Pew Research Center. At the same time, the public relations field has expanded to a degree that these specialists now outnumber reporters by nearly 5 to 1 …


  2. These are great references. Thanks for your research.
    Check out the aLmost hysterical article by Hunzeiker in Counterpunch implying that only boots on the ground and gut feelings are sufficient. A rising tide of irrationalism…


    1. Bill,

      Thanks for the link to a wonderful example of the Left’s abandoning science: “Abrupt Climate Change is Here” by Robert Hunziker, CounterPunch, 2 February 2015.

      Hunziker not only fails to mention anything in the peer-reviewed literature or the IPCC, but his “analysis” contradicts them. He focuses on methane, a powerful greenhouse gas which alarmists believe might end civilization. He neglects to mention that AR5, the recent IPCC report, expresses strong skepticism. For quotes, data and details see More good news about climate change from the IPCC: no sign yet of the methane apocalypse.

      He also quotes as definitive authorities scientists such as Peter Wadhams, often criticized for extreme views by mainstream climate scientists.

      Pure alarmism, targeting credulous people unfamiliar with the issue.


  3. Anyone who claims that an effectively infinitely large open-ended non-linear feedback-driven (where we don’t know all the feedbacks, and even the ones we do know, we are unsure of the signs of some critical ones) chaotic system – hence subject to inter alia extreme sensitivity to initial conditions – over any significant time period is either a charlatan or a computer salesman.

    Ironically, the first person to point this out was Edward Lorenz – a climate scientist. You can add as much computing power as you like, the result is purely to produce the
    wrong answer faster. Even the IPCC realised this – once.

    …in climate research and modeling we should recognise that we are dealing with a complex non linear chaotic signature and therefore that long-term prediction of future climatic states is not possible … {IPCC 2001 section page 774}

    Even closed loop models such as are used in engineering that can be iteratively tested, modified and retested ad infinitum are not infallible, as Boeing’s problems with structural cracks on their latest fully composite 787 Dreamliner, the structure of which was modelled on the most powerful supercomputers available developed cracks in its wing root.

    Plus of course the financial troubles of 2008 occurred despite huge numbers of computer predictions – all based on Black-Scholes (BS – ironic, eh?) financial calculus reassuring the bankers that their extreme leverage ratios were completely safe.


    1. Oops, missed a bit about “is capable of making any predictions worth a damn” in the first para – however, I think it still makes sense!


    2. Catweazle,

      Everyday I learn something. I didn’t know that Edwad Lorenz was, among other things, a meteorologist.

      “Even closed loop models … are not infallible”

      All valid points. But I think stated a bit harshly. Nobody looks for infallibility, just reliability that is — literally — good enough for public policy.

      “financial troubles of 2008 occurred despite huge numbers of computer predictions”

      This is, imo, closer to the mark, and something I’ve long wanted to write about (would take too long for research, unfortunately). The models showing that all was OK — and there were a wide range of them, not just using Black-Scholes (e.g., econometric models) were used … inappropriately. That is, to prove a predetermined “fact”. That is imo the #1 weakness of computer models — unless they are used in a HIGHLY controlled fashion they are confirmation bias in tangible form. The tools exist for sound use — testing vs. out of sample data, careful selection to avoid “cherry picking” parameters with pleasing outcomes, audit by outside experts in the relevant specialties, etc. But these measure defeat the purpose, and hence are seldom used.

      By the way, there were models that accurately predicted the crisis. For example, mortgage-backed security models showed clearly that the primary factor affecting default of prime mortgages was per cent equity — and the people defaulted in large numbers on non-recourse mortgages with negative equity. As we saw during the 1980s Texas oil bust, among other examples. But people preferred to be blind.

      I mentioned these facts in late 2007 to a senior MBS trader at a major investment bank. As a rebuttal he banged his fist on the desk, red-faced, shouting “prime mortgages do not default!” Sadly, his enthusiasm didn’t counteract the numbers.


    3. “But I think stated a bit harshly. Nobody looks for infallibility, just reliability that is — literally — good enough for public policy.”

      Tell that to the pensioners who, as a result of the enormous increase in their energy bill – have died because they had to make the choice between eating and heating.

      As a retired engineer who once worked on mission critical projects where even slight errors may well have ended up with a large smoking hole inn the ground, I feel entitled to judge harshly the sort of slack buffoons who insist that their computer games climate models are sufficiently robust that credulous fools are prepared to trash the economies of most of Western industrial nations!

      Tell me, would you permit your children to fly on an aeroplane that was designed by climate scientists?


    4. catweazle,

      “Tell that to the pensioners who, as a result of the enormous increase in their energy bill – have died because they had to make the choice between eating and heating.”

      I don’t understand the relevance of this objection. It sounds like cheap rhetoric. First, every public policy change has ill effects. Innocent people die in every construction project — and every war. Every cut of aid will affect innocents. To not act because there is harm done means nothing will be done. The question is balancing gain vs cost.

      Second, where are these people you refer to? How many? Some evidence for such bold claims is needed. As an engineer you wouldn’t make such specific statements without evidence.

      “As a retired engineer who once worked on mission critical projects where even slight errors may well have ended up with a large smoking hole inn the ground”

      You comment specifically said “infalibiltiy”. If you claim your work was infallible, I must discount any claim to expertise. That’s quite daft.

      (2) “the sort of slack buffoons who insist that their computer games climate models are sufficiently robust that credulous fools are prepared to trash the economies of most of Western industrial nations! Tell me, would you permit your children to fly on an aeroplane that was designed by climate scientists?”

      Can you, as a good engineer, provide data to support your assertions about the skills, beliefs and public policy recommendations of most — or even a large fraction — of climate scientists? You write as it this highly diverse group (most don’t have PHd’s in “climate science”, but in a wide range of fields) is some kind of unitary entity, allowing such bold generalizations. I am unaware of such data, and wonder if you’re just making stuff up.

      In fact we know little about the views of climate scientists — other than several well-executed surveys show that almost all agree with the IPCC’s AR5 statement that “It is extremely likely (95 – 100% certain) that human activities caused more than half of the observed increase in global mean surface temperature from 1951 to 2010.” There is weaker evidence that most of those that disagree do so with the IPCC’s confidence level (e.g., Judith Curry), usually on epistemological grounds.

      See see summaries of (and links to) the surveys:

      1. Climate scientists speak to us. What is their consensus opinion?.
      2. Puncturing the false picture of a scientific consensus about the causes and effects of global warming


  4. Excellent post. FM might find these articles interesting:

    Beware the Big Errors of ‘Big Data,’ Nassim Taleb, WIRED magazine, February 2013.

    We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level.

    Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information.

    In other words: Big data may mean more information, but it also means more false information.

    WARNING: Physics Envy May Be Hazardous To Your Wealth!

    [O]nly the smallest fraction of economic writings, theoretical and applied, has been concerned with the derivation of operationally meaningful theorems. In part at least this has been the result of the bad methodological preconceptions that economic laws deduced from a priori assumptions possessed rigor and validity independently of any empirical human behavior. But only a very few economists have gone so far as this. The majority would have been glad to enunciate meaningful theorems if any had occurred to them. In fact, the literature abounds with false generalization.

    We do not have to dig deep to find examples. Literally hundreds of learned papers have been written on the subject of utility. Take a little bad psychology, add a dash of bad philosophy and ethics, and liberal quantities of bad logic, and any economist can prove that the demand curve for a commodity is negatively inclined.

    The statistical crisis in science, American Scientist, November-December 2014.

    There is a growing realization that reported “statistically significant” claims in scientific publications are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for “probability”) is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p-value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear.

    “The effect of correlation in false discovery rate estimation,” Armin Schwartzman an Xihong Lin, Biometrika, 2011 Mar; 98(1): 199–214.

    Surviving Data Science “at the Speed of Hype,” John Foreman, Data Scientist, 30 January 2015.

    A lot of vendors want to cast the problem as a technological one. That if only you had the right tools then your analytics could stay ahead of the changing business in time for your data to inform the change rather than lag behind it.

    This is bullshit. As Kevin Hillstrom put it recently:

    If IBM Watson can find hidden correlations that help your business, then why can’t IBM Watson stem a 3 year sales drop at IBM?

    Liked by 1 person

    1. Thomas,

      All great stuff. Thank you for posting. It has taken a long time, but slowly a more skeptical view is emerging of computer models.

      Eventually, I am sure, standards for their construction and mechanisms for their validation will evolve. People then will look back on these early days and laugh at our naïveté.


  5. Computer models depend on math for their accuracy. Computers merely speed up the process of calculation, but ultimately the assumptions and the mathematical details of the model make or break the accuracy of the model.

    Unfortunately, many scientists use math — statistics in particular — without realizing the assumptions or the details involved in the math. This is not surprising, since scientists typically have great expertise within their narrow field, but usually lack expertise in statistics, differential equations, an other specialized mathematical areas.

    This becomes a huge problem when scientists assume, for example, that the underlying statistical distribution is a Gaussian or normal distribution, and base their analysis on that assumption. We now know that most economic distributions are not Gaussian, but power-law distributions with much longer fatter tails. The distribution of wealth among the U.S. population is a power-law distribution: this means that that there exist many more billionaires in America than we’d expect from a Gaussian distribution, and the wealth of the wealthiest American billionaires is much greater than we’d expect from a Gaussian distribution. Likewise, the frequency of major economic depressions is much higher than we’d expect from a Gaussian distribution, and the severity of the worst economic downturns proves orders of magnitude greater than a Gaussian distribution would suggest.

    Scientists also tend to naïvely assume that more data = better predictions. But, as Taleb and others have pointed out, random matrix theory tells us that if you sieve large enough data-sets, you get inordinately larger numbers of spurious correlations and false positives than you’d expect from a naïve application of the chi-square test, Student’s T-distribution, the bootstrap test, and so on. All those tests presuppose an underlying statistical universe which is unbiased — but Big data now lets researchers run searches until they find correlations that support their hypothesis, an then stop. This biases the data selection in the same way that picking names from the phone book would.

    Lastly, the differential equations underlying economic models ail to take shocks into account. Ordinary differential equations behave linearly within boundary values which have low Reynolds numbers. But if the Reynolds number become very high (i.e., drastic shocks to the system), the equations grow highly non-linear an turbulence results — unpredictable chaotic turbulence of the kind Eward Lorenz discovered in his weather models in the early 1960s. This led to the well-known “butterfly effect” in which a butterfly flapping its wings in the Amazon could eventually produce hurricanes in the Pacific. We now know that the same kin of behavior can occur even in well-regulated economies. Our economic system, like the global weather system, is metastable — usually homeostatic and self-regulating in the sense that small variations tend to damp down an die out…but large shocks can grow exponentially and produce catastrophic self-sustaining disruptions, like typhoons, or Great depressions.

    Alas, major schools of economic thought like Rational Choice theory or the Chicago School of Economics recognize none of these complexities and thus give false predictions under extreme economic conditions, like major depressions..


  6. The dirty secret is that most of these ‘computer models’ are written by scientists, not software engineers. Scientists are great at writing one-off scripts to pipeline a specific data set into another data set. They are not so good at combining them together into a cohesive whole.

    http://mc-computing.com/qs/Global_Warming/Model_E/Source_Code.html (source code here: http://simplex.giss.nasa.gov/snapshots/modelE2_AR5_branch.2014.10.31_01.45.44.tgz) as an example.

    Liked by 1 person

    1. Johnny,

      Thanks for mentioning this. I’ve heard that, but haven’t found references. A similar problem is that climate science involves a great deal of highly sophisticated statistical analysis — and they seldom consult statisticians.

      The larger problem is that climate science is grossly underfunded. They do the best they can with the available funds, but the apparatus — data collection and analysis — is often quite ramshackle behind the confident facade.


    2. cat,

      No, I’m not kidding. The satellites are aging, with almost no redundency. The surface temperature data has almost no quality control. We spend vast amounts on research with little coordination and less quality control. Valuable ice cores sit unexamined in freezers. The crucial dendochronology data used for paleoclimate reconstructions has not been updated — which would allow better accuracy vs. temperature records.

      Much of this could have been done with the funds blown on the hypersonic missile (the X-51) — which is nowhere near completion, and which has no obvious use. We might have been able to answer every current question in climate science with the funds blown on the F-35 — in production but fails to meet many of its basic product specs. And is probably unnecessary.


    3. “The satellites are aging,”

      The one that produced this isn’t: averaged co2 concentration in Oct – Nov 2014 from OCO-2. That may well be the most instructive satellite data ever produced. Just look where most of the CO2 appears to be coming from. Admittedly, it’s only about six weeks of data, but given that it represents Northern hemisphere winter – when it would be expected that CO2 production would be at its highest – it indicates that the received wisdom on CO2 production – ie that the vast majority of it comes from the Western industrial nations – is highly suspect.

      Here is the site from which the graphic is taken: “NASA’s Spaceborne Carbon Counter Maps New Details“, 18 Dec 2014. For comparison, here is the previous NASA computer game climate model simulation of where they thought the CO2 was coming from.

      Compare the months shown on the simulation with the period of the satellite data.


  7. Clearly, computer models are absolutely accurate and reliable. The recent prediction of 36 inches of snow in NYC a full 18 hours ahead of time saved the city from almost 2 actual inches of snow. Now come the 100-year models filled with credibility and “settled science”. Right.

    Liked by 1 person

    1. Bill,

      “Clearly, computer models are absolutely accurate and reliable.”

      Isn’t that too binary? AKA a false dilemma logical fallacy? Who claims “absolute accuracy” for their models? In fact meteorologists accurately identified the strength of the storm. And they moderately accurately forecast its course; being roughly 25 miles off. Their big error was excessively confident precision in their public statements, supported neither by the models’ known accuracy or past history. More modest forecasts and they would have looked like heroes.

      This degree of hubris has unfortunately become quite common, for reasons mysterious to me. In that sense your critique is deadly accurate, which is the subject of this post. How do we validate computer models’ long-term forecasts? Certainly the “settled science” boasts reek of hubris (if this was a Greek play their boasts in 1998-2000 would be punished by a 15+ year pause in warming). Exaggerating the breadth and depth of the consensus among climate scientists also erodes their credibility.

      For more about the lessons learned from Winter Storm Juno:


Leave a comment & share your thoughts...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s