Skip to main content

Posts

Showing posts from December, 2014

Simulated or Real: What type of data should we use when teaching social science statistics?

I just finished teaching a new course on collaborative data science to social science students. The materials are on GitHub if you're interested. What did we do and why? Maybe the most unusual thing about this class from a statistics pedagogy perspective was that it was entirely focused on real world data; data that the students gathered themselves. I gave them virtually no instruction on what data to gather. They gathered data they felt would help them answer their research questions. Students directly confronted the data warts that usually consume a large proportion of researchers' actual time. My intention was that the students systematically learn tools and best practices for how to address these warts. This is in contrast to many social scientists' statistics education. Typically, students are presented with pre-arranged data. They are then asked to perform some statistical function with it. The end. This leaves students underprepared for actually using statist

Set up R/Stan on Amazon EC2

A few months ago I posted the script that I use to set up my R/JAGS working environment on an Amazon EC2 instance. Since then I've largely transitioned to using R/ Stan to estimate my models. So, I've updated my setup script (see below). There are a few other changes: I don't install/use RStudio on Amazon EC2. Instead, I just use R from the terminal. Don't get me wrong, I love RStudio. But since what I'm doing on EC2 is just running simulations (I handle the results on my local machine), RStudio is overkill. I don't install git anymore. Instead I use source_url (from devtools) and source_data (from repmis) to source scripts from GitHub. Again all of the manipulation I'm doing to these scripts is on my local machine.