Skip to main content

Reinhart & Rogoff: Everyone makes coding mistakes, we need to make it easy to find them + Graphing uncertainty

You may have already seen a lot written on the replication of Reinhart & Rogoff’s (R & R) much cited 2010 paper done by Herndon, Ash, and Pollin. If you haven’t, here is a round up of some of some of what has been written: Konczal, Yglesias, Krugman, Cowen, Peng, FT Alphaville.

This is an interesting issue for me because it involves three topics I really like: political economy, reproducibility, and communicating uncertainty. Others have already commented on these topics in detail. I just wanted to add to this discussion by (a) talking about how this event highlights a real need for researchers to use systems that make finding and correcting mistakes easy, (b) incentivising mistake finding/correction rather than penalising it, and (c) showing uncertainty.

Systems for Finding and Correcting Mistakes

One of the problems Herndon, Ash, and Pollin found in R&R’s analysis was and Excel coding error. I love to hate on Excel as much as the next R devotee, but I think that is missing the point. The real lesson is not “don’t use Excel” the real lesson is: we all make mistakes.

(Important point: I refer throughout this post to errors caused by coding mistakes rather than purposeful fabrications and falsifications.)

Coding mistakes are an ever present part of our work. The problem is not that we make coding mistakes. Despite our best efforts we always will. The problem is that we often use tools and practices that make it difficult to find and correct our mistakes.

This is where I can get in some Excel hating: tools and practices that make it difficult to find mistakes include binary files (like Excel’s) that can’t be version controlled in a way that fully reveals the research process, not commenting code, not making your data readily available in formats that make replication easy, not having a system for quickly fixing mistakes when they are found. Sorry R users, but the last three are definitely not exclusive to Excel.

It took Herndon, Ash, and Pollin a considerable amount of time to replicate R & R’s findings and therefore find the Excel error. This seems partially because R & R did not make their analysis files readily available (Herndon, Ash, and Pollin had to ask for them). I’m not sure how this error is going to be corrected and documented. But I imagine it will be like most research corrections: kind of on the fly, mostly emailing and reposting.

How big of a detail is this? There is some debate over how big of a problem this mistake is. Roger Peng ends his really nice post:

The vibe on the Internets seems to be that if only this problem had been identified sooner, the world would be a better place. But my cynical mind says, uh, no. You can toss this incident in the very large bucket of papers with some technical errors that are easily fixed. Thankfully, someone found these errors and fixed them, and that’s a good thing. Science moves on.

I agree with most of this paragraph. But, given how important R & R’s finding was to major policy debates it would have been much better if the mistake was caught sooner rather than later. The tools and practices R & R used made it harder to find and correct the mistake, so policymakers were operating with less accurate information for longer.

Solutions: I’ve written in some detail in the most recent issue of The Political Methodologist about how cloud-based version control systems like GitHub can be used to make finding and correcting mistakes easier. Pull requests, for example, are a really nice way to directly suggest corrections.

Incentivising Error Finding and Correction

Going forward I think it will be interesting to see how this incident shapes researchers’ perceived incentives to make their work easily replicable. Replication is an important part of finding the mistakes that everyone makes. If being found to make a coding mistake (not a fabrication) has a negative impact on your academic career then there are incentives to make finding mistakes difficult, by for example making replication difficult. Most papers do not receive nearly as much attention as R & R’s. So, for most researchers making replication difficult will make it pretty unlikely that anyone will replicate your research and you’ll be home free.

This is a perverse incentive indeed.

What can we do? Many journals now require replicable code to accompany published articles. This is a good incentive. Maybe we should go further, and somehow directly incentivise the finding and correction of errors in data sets and analysis code. Ideas could include giving more weight to replication studies at hiring and promotion committees. Maybe even allowing these committees to include information on researchers’ GitHub pull requests that meaningfully improve other’s work by correcting mistakes.

This of course might create future perversion incentives to add errors so that they can then be found. I think this is a bit fanciful. There are surely enough negative social incentives (i.e. embarrassment) surrounding making mistakes to prevent this.

Showing Uncertainty

Roger Peng’s post highlighted the issue of graphing uncertainty, but I just wanted to build it out a little further. The interpretation of the correlation R & R’s found between GDP Growth and Government Debt could have been tempered significantly before any mistakes were found by more directly communicating their original uncertainty. In their original paper, they presented the relationship using bar graphs of average and median GDP growth per grouped debt/GDP level:

Beyond showing the mean and median there is basically no indication of the distribution of the data they are from.

Herndon, Ash, and Pollin put together some nice graphs of these distributions (and avoid that thing economists do of using two vertical axis with two different meanings).

Here is one that gets rid of the groups altogether:

If R & R had shown a simple scatter plot like this (though they did exclude some of the higher GDP Growth country-years at the high debt end, so their's would have looked different), it would have been much more difficult to overly interpret the substantive–policy–value of a correlation between GDP/growth and debt/GDP.

Maybe this wouldn’t have actually changed the policy debate that much, As Mark Blyth argues in his recent book on austerity “facts never disconfirm a good ideology” (p. 18). But at least Paul Krugman might not have had to debate debt/GDP cutoff points on CNBC (for example time point 12:40):

P.S. To R & R’s credit, they do often make their data available. Their data has been useful for at least one of my papers. However, it is often available in a format that is hard to use for cross-country statistical analysis, including, I would imagine, their own. Though I have never found any errors in the data, reporting and implementing corrections to this data would be piecemeal at best.


Patrick said…
The main lesson is that a scatterplot is all the news outlets can take. Sophisticated econometrics, Granger-causality, cointegration, garch effects, naaaaaaaaaaaaaaaaaaaaaaaaah.
Unknown said…
Sadly, they didn't even look at the scatterplot, only the point estimates.

Also, once the replication data was released (3 years after the working paper came out) it didn't take too long for someone to do a simple (lags!) analysis that gives a much better indication of the direction of the relationship and the uncertainty about it:
Unknown said…
Awesome post. But the thing could have been avoided if they would not have use Excel. If you read the article here you will see that T. Herndon received the data in an Excel spreadsheet...
Speaking of reproducible research...
Unknown said…
Excel was definitely part of the problem.

But not making data and source code available can be a problem regardless of how the stats are done (though even this is always worse with binary files like Excel's).
Unknown said…
The simplest (?) answer to this is to use Excel and Mathematica to arrive at the same answer. It is VERY hard to make a numeric mistake and then replicate that same mistake symbolically. I have been using that method for years. Anecdotally, of my own mistakes I have found so far, 100% of them have been in the Excel file.
judefowler764 said…
This comment has been removed by the author.
judefowler764 said…
Wollen Sie Ihr eigenes Remote PHP-Entwickler-Team in der Ukraine aufbauen? Erfahren Sie in diesem Artikel, wie Sie hochqualifizierte PHP-Entwickler finden und ein effizientes Team aufstellen können. Die Ukraine ist bekannt für ihre talentierten Entwickler und wettbewerbsfähigen Preise. Wir bieten Ihnen Unterstützung bei der Zusammenstellung Ihres Teams, von der Rekrutierung bis hin zur Projektverwaltung. Erfahren Sie, wie Sie die Vorteile eines eigenen Remote-Teams nutzen können, um Ihre Softwareentwicklung zu optimieren. Kontaktieren Sie uns noch heute, um weitere Informationen zu erhalten und Ihr eigenes Remote PHP-Entwickler-Team aufzubauen.

Popular posts from this blog

Dropbox & R Data

I'm always looking for ways to download data from the internet into R. Though I prefer to host and access plain-text data sets (CSV is my personal favourite) from GitHub (see my short paper on the topic) sometimes it's convenient to get data stored on Dropbox . There has been a change in the way Dropbox URLs work and I just added some functionality to the repmis R package. So I though that I'ld write a quick post on how to directly download data from Dropbox into R. The download method is different depending on whether or not your plain-text data is in a Dropbox Public folder or not. Dropbox Public Folder Dropbox is trying to do away with its public folders. New users need to actively create a Public folder. Regardless, sometimes you may want to download data from one. It used to be that files in Public folders were accessible through non-secure (http) URLs. It's easy to download these into R, just use the read.table command, where the URL is the file name

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame. I've found the various R methods for doing this hard to remember and usually need to look at old blog posts . Any time we find ourselves using the same series of codes over and over, it's probably time to put them into a function. So, I added a new command– slide –to the DataCombine R package (v0.1.5). Building on the shift function TszKin Julian posted on his blog , slide allows you to slide a variable up by any time unit to create a lead or down to create a lag. It returns the lag/lead variable to a new column in your data frame. It works with both data that has one observed unit and with time-series cross-sectional data. Note: your data needs to be in ascending time order with equally spaced time increments. For example 1995, 1996

A Link Between topicmodels LDA and LDAvis

Carson Sievert and Kenny Shirley have put together the really nice LDAvis R package. It provides a Shiny-based interactive interface for exploring the output from Latent Dirichlet Allocation topic models. If you've never used it, I highly recommend checking out their XKCD example (this paper also has some nice background). LDAvis doesn't fit topic models, it just visualises the output. As such it is agnostic about what package you use to fit your LDA topic model. They have a useful example of how to use output from the lda package. I wanted to use LDAvis with output from the topicmodels package. It works really nicely with texts preprocessed using the tm package. The trick is extracting the information LDAvis requires from the model and placing it into a specifically structured JSON formatted object. To make the conversion from topicmodels output to LDAvis JSON input easier, I created a linking function called topicmodels_json_ldavis . The full function is below. To