Skip to main content

How I Accidentally Wrote a Paper on Supervisory Transparency in the European Union and Why You Should Too

Research is an unpredictable thing. You head in one direction, but end up going another. Here is a recent example:

A co-author and I had an idea for a paper. It's a long story, but basically we wanted to compare banks in the US to those in the EU. This was a situation where our desire to explore a theory was egged on by, what we believed, was available data. In the US it's easy to gather data on banks because the regulators have a nice website where they release the filings banks send them. The data is in a really good format for statistical analysis. US done. We thought our next move would be to quickly gather similar data for EU banks and we would be on our way.

First we contacted the UK's Financial Conduct Authority. Surprisingly, they told us that not only did they not release this data, but it was actually illegal for them to do so. Pretty frustrating. Answers to one question stymied by a lack of data. Argh. I guess we'll just keep looking to see what kind of sample of European countries we can come up with and hope it doesn't lead to ridiculous sampling bias.

We eventually found that only 11 EU countries (out of 28) release any such data and it is in a hodgepodge of different formats making it very difficult to compare across countries. This is remarkable when compared to the US and kind of astounded me given my open data priors. We then looked at EU level initiatives to increase supervisory transparency. The European Banking Authority has made a number of attempts to increase transparency. For example, it asks Member States to submit some basic aggregate level data about their banking systems. It makes this data available on its website.

Countries have a lot of reporting flexibility. They can even choose to label specific items as non-material, non-applicable, or even confidential. Remarkably, a fair number of countries don't even do this. They just report nothing, as we can see here:

Number of Member States without missing aggregate banking data reported to the EBA


Though almost all Member States reported data during the height of the crisis (2009) this is an aberration. In fact a lot of countries just don't report anything.

What are the implications of this secrecy we asked? We decided to do more research to find out. We published the first outcome of this project here. Have a look if you're interested, but that isn't the point of this blog post. The point is that we set off to research one thing, but ended up stumbling upon another problem that was worthy of investigation.

This post's title is clearly a bit facetious. But the general point is serious. We need to be flexible enough and curious enough to be able to answer the questions we didn't even know to ask before we go looking.

Comments

Popular posts from this blog

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame.I've found the various R methods for doing this hard to remember and usually need to look at old blogposts. Any time we find ourselves using the same series of codes over and over, it's probably time to put them into a function. So, I added a new command–slide–to the DataCombine R package (v0.1.5).Building on the shift function TszKin Julian posted on his blog, slide allows you to slide a variable up by any time unit to create a lead or down to create a lag. It returns the lag/lead variable to a new column in your data frame. It works with both data that has one observed unit and with time-series cross-sectional data.Note: your data needs to be in ascending time order with equally spaced time increments. For example 1995, 1996, 1997. ExamplesNot…

Showing results from Cox Proportional Hazard Models in R with simPH

Update 2 February 2014: A new version of simPH (Version 1.0) will soon be available for download from CRAN. It allows you to plot using points, ribbons, and (new) lines. See the updated package description paper for examples. Note that the ribbons argument will no longer work as in the examples below. Please use type = 'ribbons' (or 'points' or 'lines'). Effectively showing estimates and uncertainty from Cox Proportional Hazard (PH) models, especially for interactive and non-linear effects, can be challenging with currently available software. So, researchers often just simply display a results table. These are pretty useless for Cox PH models. It is difficult to decipher a simple linear variable’s estimated effect and basically impossible to understand time interactions, interactions between variables, and nonlinear effects without the reader further calculating quantities of interest for a variety of fitted values.So, I’ve been putting together the simPH R p…

Do Political Scientists Care About Effect Sizes: Replication and Type M Errors

Reproducibility has come a long way in political science. Many major journals now require replication materials be made available either on their websites or some service such as the Dataverse Network. Most of the top journals in political science have formally committed to reproducible research best practices by signing up to the The (DA-RT) Data Access and Research Transparency Joint Statement.This is certainly progress. But what are political scientists actually supposed to do with this new information? Data and code availability does help avoid effort duplication--researchers don't need to gather data or program statistical procedures that have already been gathered or programmed. It promotes better research habits. It definitely provides ''procedural oversight''. We would be highly suspect of results from authors that were unable or unwilling to produce their code/data.However, there are lots of problems that data/code availability requirements do not address.…