Reinhart & Rogoff: Everyone makes coding mistakes, we need to make it easy to find them + Graphing uncertainty
You may have already seen a lot written on the replication of Reinhart & Rogoff’s (R & R) much cited 2010 paper done by Herndon, Ash, and Pollin. If you haven’t, here is a round up of some of some of what has been written: Konczal, Yglesias, Krugman, Cowen, Peng, FT Alphaville.
This is an interesting issue for me because it involves three topics I really like: political economy, reproducibility, and communicating uncertainty. Others have already commented on these topics in detail. I just wanted to add to this discussion by (a) talking about how this event highlights a real need for researchers to use systems that make finding and correcting mistakes easy, (b) incentivising mistake finding/correction rather than penalising it, and (c) showing uncertainty.
Systems for Finding and Correcting Mistakes
One of the problems Herndon, Ash, and Pollin found in R&R’s analysis was and Excel coding error. I love to hate on Excel as much as the next R devotee, but I think that is missing the point. The real lesson is not “don’t use Excel” the real lesson is: we all make mistakes.
(Important point: I refer throughout this post to errors caused by coding mistakes rather than purposeful fabrications and falsifications.)
Coding mistakes are an ever present part of our work. The problem is not that we make coding mistakes. Despite our best efforts we always will. The problem is that we often use tools and practices that make it difficult to find and correct our mistakes.
This is where I can get in some Excel hating: tools and practices that make it difficult to find mistakes include binary files (like Excel’s) that can’t be version controlled in a way that fully reveals the research process, not commenting code, not making your data readily available in formats that make replication easy, not having a system for quickly fixing mistakes when they are found. Sorry R users, but the last three are definitely not exclusive to Excel.
It took Herndon, Ash, and Pollin a considerable amount of time to replicate R & R’s findings and therefore find the Excel error. This seems partially because R & R did not make their analysis files readily available (Herndon, Ash, and Pollin had to ask for them). I’m not sure how this error is going to be corrected and documented. But I imagine it will be like most research corrections: kind of on the fly, mostly emailing and reposting.
How big of a detail is this? There is some debate over how big of a problem this mistake is. Roger Peng ends his really nice post:
The vibe on the Internets seems to be that if only this problem had been identified sooner, the world would be a better place. But my cynical mind says, uh, no. You can toss this incident in the very large bucket of papers with some technical errors that are easily fixed. Thankfully, someone found these errors and fixed them, and that’s a good thing. Science moves on.
I agree with most of this paragraph. But, given how important R & R’s finding was to major policy debates it would have been much better if the mistake was caught sooner rather than later. The tools and practices R & R used made it harder to find and correct the mistake, so policymakers were operating with less accurate information for longer.
Solutions: I’ve written in some detail in the most recent issue of The Political Methodologist about how cloud-based version control systems like GitHub can be used to make finding and correcting mistakes easier. Pull requests, for example, are a really nice way to directly suggest corrections.
Incentivising Error Finding and Correction
Going forward I think it will be interesting to see how this incident shapes researchers’ perceived incentives to make their work easily replicable. Replication is an important part of finding the mistakes that everyone makes. If being found to make a coding mistake (not a fabrication) has a negative impact on your academic career then there are incentives to make finding mistakes difficult, by for example making replication difficult. Most papers do not receive nearly as much attention as R & R’s. So, for most researchers making replication difficult will make it pretty unlikely that anyone will replicate your research and you’ll be home free.
I can't send you my data b/c I think you might find out I made an error. #overlyhonestmethods— Carlisle Rainey (@carlislerainey) January 9, 2013
This is a perverse incentive indeed.
What can we do? Many journals now require replicable code to accompany published articles. This is a good incentive. Maybe we should go further, and somehow directly incentivise the finding and correction of errors in data sets and analysis code. Ideas could include giving more weight to replication studies at hiring and promotion committees. Maybe even allowing these committees to include information on researchers’ GitHub pull requests that meaningfully improve other’s work by correcting mistakes.
This of course might create future perversion incentives to add errors so that they can then be found. I think this is a bit fanciful. There are surely enough negative social incentives (i.e. embarrassment) surrounding making mistakes to prevent this.
Roger Peng’s post highlighted the issue of graphing uncertainty, but I just wanted to build it out a little further. The interpretation of the correlation R & R’s found between GDP Growth and Government Debt could have been tempered significantly before any mistakes were found by more directly communicating their original uncertainty. In their original paper, they presented the relationship using bar graphs of average and median GDP growth per grouped debt/GDP level:
Beyond showing the mean and median there is basically no indication of the distribution of the data they are from.
Herndon, Ash, and Pollin put together some nice graphs of these distributions (and avoid that thing economists do of using two vertical axis with two different meanings).
Here is one that gets rid of the groups altogether:
If R & R had shown a simple scatter plot like this (though they did exclude some of the higher GDP Growth country-years at the high debt end, so their's would have looked different), it would have been much more difficult to overly interpret the substantive–policy–value of a correlation between GDP/growth and debt/GDP.
Maybe this wouldn’t have actually changed the policy debate that much, As Mark Blyth argues in his recent book on austerity “facts never disconfirm a good ideology” (p. 18). But at least Paul Krugman might not have had to debate debt/GDP cutoff points on CNBC (for example time point 12:40):
P.S. To R & R’s credit, they do often make their data available. Their data has been useful for at least one of my papers. However, it is often available in a format that is hard to use for cross-country statistical analysis, including, I would imagine, their own. Though I have never found any errors in the data, reporting and implementing corrections to this data would be piecemeal at best.