Economics and Replicability

The key element of any true empirical result is replicability. If a result is not reproducible, then the purported effect was almost surely the result of either random chance, error, or fraud. This is not limited to the social sciences, of course. Two fairly famous incidents from the natural sciences highlight the importance of replicability in sorting out knowledge from nonsense One of the more famous erroneous claims was the announcement by researchers at the University of Utah that they had successfully produced cold fusion. In fairly short order, other scienteists tried and failed to replicate cold fusion and cold fusion was discredited.

More recently, a famous researcher Lucent’s Bell Labs was caught engaging in outright fraud:

Yet recently, an internal report from Lucent Technologies’ Bell Laboratories concluded that data in 16 published papers authored by researcher Hendrik Schön were fraudulent. The papers were reviewed and accepted by several prestigious scientific journals, including Nature, Science, Physical Review, and Applied Physics Letters. Yet, in many of the papers, the fraud was obvious, even to an untrained eye, with data repeated point-for-point and impossibly smooth or noise-free. All the papers passed through internal review at Bell Labs, one of the world’s foremost industrial research institutions, and the journal peer review system without raising alarms. The fraud was discovered only after journal readers started pointing it out.

Now, Angry Bear contributer Karsten alerts me to a blog by an economics professor at Southern Utah, David Tufte, who wrote a post titled, “Uh-Oh”:

In short, 1) economists publish papers with results that are not replicable, and 2) few make enough effort to notice.

In turn, Tufte is referencing “Lessons from the JMCB Archive” by B.D. McCullough, Kerry Anne McGeary, and Teresa D. Harrison (Journal of Money, Credit, and Banking, forthcoming.) Beginning in 1996, the JMCB implemented a policy of requiring authors publishing empirical results to submit both the data and the code used to generate the results in the paper.

McCullough, McGeary, and Harrison examined the archive and found that

  • Out of 193 papers with empirical results, only 69 authors had followed the stated policy and submitted anything to the archive.
  • Of those 69, eleven submitted data but not the code used to generate their results, and 58 submitted both.
  • After corresponding with authors, they were eventually able to get sufficient data and code to attempt to replicate 62 published results.
  • In their 62 replication attempts, McCullough, McGeary, and Harrison succeeeded in only 14 instances, or 22% of the time.

Now this all looks fairly damning, and to some extent it is. McCullough, McGeary, and Harrison blame this on bad incentives: because there is virtually no market for publishing results that are simply replications of previous results, there is little incentive for economists to engage in replication. Because other economists are unlikely to attempt replication, researchers have little incentive to ensure that their research is in fact replicable (often, this is more a matter of carefully archiving data and commenting programs than an issue of honesty or dishonesty.)

However, I think their paper overstates the extent of the problem. In particular, they correctly note that there is little incentive to publish results that are just replications. However, I think they understate the amount of unobserved replication that occurs. Isaac Newton once said, “If I have seen farther than others, it is because I was standing on the shoulders of giants.” Emprical economics proceeds in similar fashion: the bulk of published papers adapt and refine earlier analyses in order to test new therories, take advantage of increased computing power, or to implement new econometric methodologies. Speaking from experience, each of these will typically entail an incidental replication, as a starting point. Of course, the final published result is not the replication, but the new result.

I’ve got more to say on this subject, but this post is already long and right now I need to go generate some hopefully reproducible results. I’ll try to come back to this topic later this week.