As I go about cleaning and merging data sets with R I often end up creating and using simple functions over and over. When this happens, I stick them in the DataCombine package. This makes it easier for me to remember how to do an operation and others can possibly benefit from simplified and (hopefully) more intuitive code.
I've talked about some of the commands in DataCombine in previous posts. In this post I'll give examples for a few more that I've added over the past couple of months. Note: these examples are based on DataCombine version 0.1.11.
Here is a brief run down of the functions covered in this post:
FindReplace: a function to replace multiple patterns found in a character string column of a data frame.
MoveFront: moves variables to the front of a data frame. This can be useful if you have a data frame with many variables and want to move a variable or variables to the front.
rmExcept: removes all objects from a work space except those specified by the user.
Recently I needed to replace many patterns in a column of strings. Here is a short example. Imagine we have a data frame like this:
ABData <- data.frame(a = c("London, UK", "Oxford, UK", "Berlin, DE", "Hamburg, DE", "Oslo, NO"), b = c(8, 0.1, 3, 2, 1))
Ok, now I want to replace the
DE parts of the strings with
Germany. So I create a data frame with two columns. The first records the pattern and the second records what I want to replace the pattern with:
Replaces <- data.frame(from = c("UK", "DE"), to = c("England", "Germany"))
Now I can just use
FindReplace to make the replacements all at once:
library(DataCombine) ABNewDF <- FindReplace(data = ABData, Var = "a", replaceData = Replaces, from = "from", to = "to", exact = FALSE) # Show changes ABNewDF
## a b ## 1 London, England 8.0 ## 2 Oxford, England 0.1 ## 3 Berlin, Germany 3.0 ## 4 Hamburg, Germany 2.0 ## 5 Oslo, NO 1.0
If you set
exact = TRUE then
FindReplace will only replace exact pattern matches. Also, you can set
vector = TRUE to return only a vector of the column you replaced (the
Var column), rather than the whole data frame.
On occasion I've wanted to move a few variables to the front of a data frame. The
MoveFront function makes this pretty simple. It only has two arguments:
Var. Data is the data frame and
Var is a character vector with the columns I want to move to the front of the data frame in the order that I want them. Here is an example:
# Create dummy data A <- B <- C <- 1:50 OldOrder <- data.frame(A, B, C) names(OldOrder)
##  "A" "B" "C"
# Move B and A to the front NewOrder2 <- MoveFront(OldOrder, c("B", "A")) names(NewOrder2)
##  "B" "A" "C"
Finally, sometimes I want to clean up my work space and only keep specific objects. I want to remove everything else. This is straightforward with
rmExcept. For example:
# Create objects A <- 1 B <- 2 C <- 3 # Remove all objects except for A rmExcept("A")
## Removed the following objects: ## ABData, ABNewDF, B, C, NewOrder2, OldOrder, Replaces
# Show workspace ls()
##  "A"
You can set the environment you want to clean up with the
envir argument. By default is is your global environment.