Skip to main content

How I Accidentally Wrote a Paper on Supervisory Transparency in the European Union and Why You Should Too

Research is an unpredictable thing. You head in one direction, but end up going another. Here is a recent example:

A co-author and I had an idea for a paper. It's a long story, but basically we wanted to compare banks in the US to those in the EU. This was a situation where our desire to explore a theory was egged on by, what we believed, was available data. In the US it's easy to gather data on banks because the regulators have a nice website where they release the filings banks send them. The data is in a really good format for statistical analysis. US done. We thought our next move would be to quickly gather similar data for EU banks and we would be on our way.

First we contacted the UK's Financial Conduct Authority. Surprisingly, they told us that not only did they not release this data, but it was actually illegal for them to do so. Pretty frustrating. Answers to one question stymied by a lack of data. Argh. I guess we'll just keep looking to see what kind of sample of European countries we can come up with and hope it doesn't lead to ridiculous sampling bias.

We eventually found that only 11 EU countries (out of 28) release any such data and it is in a hodgepodge of different formats making it very difficult to compare across countries. This is remarkable when compared to the US and kind of astounded me given my open data priors. We then looked at EU level initiatives to increase supervisory transparency. The European Banking Authority has made a number of attempts to increase transparency. For example, it asks Member States to submit some basic aggregate level data about their banking systems. It makes this data available on its website.

Countries have a lot of reporting flexibility. They can even choose to label specific items as non-material, non-applicable, or even confidential. Remarkably, a fair number of countries don't even do this. They just report nothing, as we can see here:

Number of Member States without missing aggregate banking data reported to the EBA


Though almost all Member States reported data during the height of the crisis (2009) this is an aberration. In fact a lot of countries just don't report anything.

What are the implications of this secrecy we asked? We decided to do more research to find out. We published the first outcome of this project here. Have a look if you're interested, but that isn't the point of this blog post. The point is that we set off to research one thing, but ended up stumbling upon another problem that was worthy of investigation.

This post's title is clearly a bit facetious. But the general point is serious. We need to be flexible enough and curious enough to be able to answer the questions we didn't even know to ask before we go looking.

Comments

Popular posts from this blog

Showing results from Cox Proportional Hazard Models in R with simPH

Update 2 February 2014: A new version of simPH (Version 1.0) will soon be available for download from CRAN. It allows you to plot using points, ribbons, and (new) lines. See the updated package description paper for examples. Note that the ribbons argument will no longer work as in the examples below. Please use type = 'ribbons' (or 'points' or 'lines' ). Effectively showing estimates and uncertainty from Cox Proportional Hazard (PH) models , especially for interactive and non-linear effects, can be challenging with currently available software. So, researchers often just simply display a results table. These are pretty useless for Cox PH models. It is difficult to decipher a simple linear variable’s estimated effect and basically impossible to understand time interactions, interactions between variables, and nonlinear effects without the reader further calculating quantities of interest for a variety of fitted values. So, I’ve been putting together th

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame. I've found the various R methods for doing this hard to remember and usually need to look at old blog posts . Any time we find ourselves using the same series of codes over and over, it's probably time to put them into a function. So, I added a new command– slide –to the DataCombine R package (v0.1.5). Building on the shift function TszKin Julian posted on his blog , slide allows you to slide a variable up by any time unit to create a lead or down to create a lag. It returns the lag/lead variable to a new column in your data frame. It works with both data that has one observed unit and with time-series cross-sectional data. Note: your data needs to be in ascending time order with equally spaced time increments. For example 1995, 1996

Dropbox & R Data

I'm always looking for ways to download data from the internet into R. Though I prefer to host and access plain-text data sets (CSV is my personal favourite) from GitHub (see my short paper on the topic) sometimes it's convenient to get data stored on Dropbox . There has been a change in the way Dropbox URLs work and I just added some functionality to the repmis R package. So I though that I'ld write a quick post on how to directly download data from Dropbox into R. The download method is different depending on whether or not your plain-text data is in a Dropbox Public folder or not. Dropbox Public Folder Dropbox is trying to do away with its public folders. New users need to actively create a Public folder. Regardless, sometimes you may want to download data from one. It used to be that files in Public folders were accessible through non-secure (http) URLs. It's easy to download these into R, just use the read.table command, where the URL is the file name