Skip to main content

Posts

Showing posts with the label Stata

Stata Country Name Standardizer (Updated)

I just updated my Stata do-file for standardizing country names (see earlier post here ). The main update is that I've added World Values Survey country codes. The do-file now lives at its own Git here . I hope to have an R version of this in the near future. (I still like using data to merge together large cross-country data sets. For example, the full World Values Survey is a bit unwieldy in R.) Update 20 February 2012: I just ran across Vincent Arel-Bundock 's countrycode package for R. I haven't tried it out yet, but from reading the documentation it looks like countrycode does pretty much what my do-file does, but better, e.g. it includes more country coding schemes. Vincent Arel-Bundock is also the author of another R package I really like, WDI . WDI makes it easy to grab World Bank Indicators . I've used it a number of times in post for this blog.

Standardise Country Names For Stata Data

If you regularly put together data sets for cross-country analysis, you'll probably know that it's a real pain to standardise country names so that you can merge together files from different sources. For example, you want to merge two data sets: A and B . In data set A the country Bosnia and Herzegovina is referred to as "Bosnia-Hertz" and in B  it is called "Bosnia-Herzegovina". To merge them into one file that you can use for data analysis you have to find this discrepancy and then change at least one of the names so that they both are the same. This is really tedious to do across multiple data sets with tens or hundreds of countries. Over the years I've created a Stata Do-file that standardises country names and attaches their IMF country codes . You can find the file here .  It clearly only standardises country name variations that I've come across. An easy way to check if a country name has not been standardised is to see if the do