Skip to main content

How I Accidentally Wrote a Paper on Supervisory Transparency in the European Union and Why You Should Too

Research is an unpredictable thing. You head in one direction, but end up going another. Here is a recent example:

A co-author and I had an idea for a paper. It's a long story, but basically we wanted to compare banks in the US to those in the EU. This was a situation where our desire to explore a theory was egged on by, what we believed, was available data. In the US it's easy to gather data on banks because the regulators have a nice website where they release the filings banks send them. The data is in a really good format for statistical analysis. US done. We thought our next move would be to quickly gather similar data for EU banks and we would be on our way.

First we contacted the UK's Financial Conduct Authority. Surprisingly, they told us that not only did they not release this data, but it was actually illegal for them to do so. Pretty frustrating. Answers to one question stymied by a lack of data. Argh. I guess we'll just keep looking to see what kind of sample of European countries we can come up with and hope it doesn't lead to ridiculous sampling bias.

We eventually found that only 11 EU countries (out of 28) release any such data and it is in a hodgepodge of different formats making it very difficult to compare across countries. This is remarkable when compared to the US and kind of astounded me given my open data priors. We then looked at EU level initiatives to increase supervisory transparency. The European Banking Authority has made a number of attempts to increase transparency. For example, it asks Member States to submit some basic aggregate level data about their banking systems. It makes this data available on its website.

Countries have a lot of reporting flexibility. They can even choose to label specific items as non-material, non-applicable, or even confidential. Remarkably, a fair number of countries don't even do this. They just report nothing, as we can see here:

Number of Member States without missing aggregate banking data reported to the EBA

Though almost all Member States reported data during the height of the crisis (2009) this is an aberration. In fact a lot of countries just don't report anything.

What are the implications of this secrecy we asked? We decided to do more research to find out. We published the first outcome of this project here. Have a look if you're interested, but that isn't the point of this blog post. The point is that we set off to research one thing, but ended up stumbling upon another problem that was worthy of investigation.

This post's title is clearly a bit facetious. But the general point is serious. We need to be flexible enough and curious enough to be able to answer the questions we didn't even know to ask before we go looking.


Popular posts from this blog

A Link Between topicmodels LDA and LDAvis

Carson Sievert and Kenny Shirley have put together the really nice LDAvis R package. It provides a Shiny-based interactive interface for exploring the output from Latent Dirichlet Allocation topic models. If you've never used it, I highly recommend checking out their XKCD example (this paper also has some nice background). LDAvis doesn't fit topic models, it just visualises the output. As such it is agnostic about what package you use to fit your LDA topic model. They have a useful example of how to use output from the lda package. I wanted to use LDAvis with output from the topicmodels package. It works really nicely with texts preprocessed using the tm package. The trick is extracting the information LDAvis requires from the model and placing it into a specifically structured JSON formatted object. To make the conversion from topicmodels output to LDAvis JSON input easier, I created a linking function called topicmodels_json_ldavis . The full function is below. To

Set up R/Stan on Amazon EC2

A few months ago I posted the script that I use to set up my R/JAGS working environment on an Amazon EC2 instance. Since then I've largely transitioned to using R/ Stan to estimate my models. So, I've updated my setup script (see below). There are a few other changes: I don't install/use RStudio on Amazon EC2. Instead, I just use R from the terminal. Don't get me wrong, I love RStudio. But since what I'm doing on EC2 is just running simulations (I handle the results on my local machine), RStudio is overkill. I don't install git anymore. Instead I use source_url (from devtools) and source_data (from repmis) to source scripts from GitHub. Again all of the manipulation I'm doing to these scripts is on my local machine.

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame. I've found the various R methods for doing this hard to remember and usually need to look at old blog posts . Any time we find ourselves using the same series of codes over and over, it's probably time to put them into a function. So, I added a new command– slide –to the DataCombine R package (v0.1.5). Building on the shift function TszKin Julian posted on his blog , slide allows you to slide a variable up by any time unit to create a lead or down to create a lag. It returns the lag/lead variable to a new column in your data frame. It works with both data that has one observed unit and with time-series cross-sectional data. Note: your data needs to be in ascending time order with equally spaced time increments. For example 1995, 1996