Fun with .Rprofile and customizing R startup

Over the years, I've meticulously compiled–and version controlled–massive and extensive configuration files for virtually all of my most used utilities, most notably vim, tmux, and zsh.

In fact, one of the only configurable utilities for which I had no special configuration schema was R. This is extremely surprising, given that I use R everyday.

One reason I think that this was the case is because I came to R from using general-purpose programming languages for which there is no provision to configure the language in a way that would actually change results or program output.

I only vaguely knew that .Rprofile was a configuration file that some people used, and that others warned against using, but it never occurred to me to actually use it for myself.

Because I never used it, I developed odd habits and rituals in my interactive R programming including adding "stringsAsFactors=FALSE" to all of my "read.csv" function calls and making frequent calls to the "option()" function.

Since I actually began to use and expand my R configuration, though, I've realized how much I've been missing. I pre-set all my preferred options (saving time) and I've even made provisions for some cool tricks and hacks.

That being said, there's a certain danger in using a custom R profile but we'll talk about how to thwart that later.

The R Startup Process

In the absence of any command-line flags being used, when R starts up, it will "source" (run) the site-wide R startup configuration file/script if it exists. In a fresh install of R, this will rarely exist, but if it does, it will usually be in '/Library/Frameworks/R.framework/Resources/etc/' on OS X, 'C:\Program Files\R\R-***\etc\' on Windows, or '/etc/R/' on Debian. Next, it will check for a .Rprofile hidden file in the current working directory (the directory where R is started on the command-line) to source. Failing that, it will check your home directory for the .Rprofile hidden file.

You can check if you have a site-wide R configuration script by running

R.home(component = "home")

in the R console and then checking for the presence of Rprofile.site in that directory. The presence of the user-defined R configuration can be checked for in the directory of whichever path

path.expand("~")

indicates.

More information on the R startup process can be found here and here.

The site-wide R configuration script
For most R installations on primarily single-user systems, using the site-wide R configuration script should be given up in favor of the user-specific configuration. That being said, a look at the boilerplate site-wide R profile that Debian furnishes (but comments out by default) provides some good insight into what might be a good idea to include in this file, if you choose to use it.

##                      Emacs please make this -*- R -*-
## empty Rprofile.site for R on Debian
##
## Copyright (C) 2008 Dirk Eddelbuettel and GPL'ed
##
## see help(Startup) for documentation on ~/.Rprofile and Rprofile.site

# ## Example of .Rprofile
# options(width=65, digits=5)
# options(show.signif.stars=FALSE)
# setHook(packageEvent("grDevices", "onLoad"),
#         function(...) grDevices::ps.options(horizontal=FALSE))
# set.seed(1234)
# .First <- function() cat("\n   Welcome to R!\n\n")
# .Last <- function()  cat("\n   Goodbye!\n\n")

# ## Example of Rprofile.site
# local({
#  # add MASS to the default packages, set a CRAN mirror
#  old <- getOption("defaultPackages"); r <- getOption("repos")
#  r["CRAN"] <- "http://my.local.cran"
#  options(defaultPackages = c(old, "MASS"), repos = r)
#})

Two things you might want to do in a site-wide R configuration file is add other packages to the default packages list and set a preferred CRAN mirror. Other things that the above snippet indicates you can do is set various width and number display options, setting a random-number seed (making random number generation deterministic for reproducible analysis), and hiding the stars that R shows for different significance levels (ostensibly because of their connection to the much-maligned NHST paradigm).

The user-specific .Rprofile
In contrast to the site-wide config (that will be used for all users on the system), the user-specific R configuration file is a place to put more personal preferences, shortcuts, aliases, and hacks. Immediately below is my .Rprofile.

local({r <- getOption("repos")
      r["CRAN"] <- "http://cran.revolutionanalytics.com"
      options(repos=r)})

options(stringsAsFactors=FALSE)

options(max.print=100)

options(scipen=10)

options(editor="vim")

# options(show.signif.stars=FALSE)

options(menu.graphics=FALSE)

options(prompt="> ")
options(continue="... ")

options(width = 80)

q <- function (save="no", ...) {
  quit(save=save, ...)
}

utils::rc.settings(ipck=TRUE)

.First <- function(){
  if(interactive()){
    library(utils)
    timestamp(,prefix=paste("##------ [",getwd(),"] ",sep=""))

  }
}

.Last <- function(){
  if(interactive()){
    hist_file <- Sys.getenv("R_HISTFILE")
    if(hist_file=="") hist_file <- "~/.RHistory"
    savehistory(hist_file)
  }
}

if(Sys.getenv("TERM") == "xterm-256color")
  library("colorout")

sshhh <- function(a.package){
  suppressWarnings(suppressPackageStartupMessages(
    library(a.package, character.only=TRUE)))
}

auto.loads <-c("dplyr", "ggplot2")

if(interactive()){
  invisible(sapply(auto.loads, sshhh))
}

.env <- new.env()


.env$unrowname <- function(x) {
  rownames(x) <- NULL
  x
}

.env$unfactor <- function(df){
  id <- sapply(df, is.factor)
  df[id] <- lapply(df[id], as.character)
  df
}

attach(.env)

message("\n*** Successfully loaded .Rprofile ***\n")

[Lines 1-3]: First, because I don't have a site-wide R configuration script, I set my local CRAN mirror here. My particular choice of mirror is largely arbitrary.

[Line 5]: If stringsAsFactors hasn't bitten you yet, it will.

[Line 9]: Setting 'scipen=10' effectively forces R to never use scientific notation to express very small or large numbers.

[Line 13]: I included the snippet to turn off significance stars because it is a popular choice, but I have it commented out because ever since 1st grade I've used number of stars as a proxy for my worth as a person.

[Line 15]: I don't have time for Tk to load; I'd prefer to use the console, if possible.

[Lines 17-18]: Often, when working in the interactive console I forget a closing brace or paren. When I start a new line, R changes the prompt to "+" to indicate that it is expecting a continuation. Because "+" and ">" are the same width, though, I often don't notice and really screw things up. These two lines make the R REPL mimic the Python REPL by changing the continuation prompt to the wider "...".

[Lines 22-24]: Change the default behavior of "q()" to quit immediately and not save workspace.

[Line 26]: This snippet allows you to tab-complete package names for use in "library()" or "require()" calls. Credit for this one goes to @mikelove.

[Lines 28-34]: There are three main reasons I like to have R save every command I run in the console into a history file.

  • Occasionally I come up with a clever way to solve a problem in an interactive session that I may want to remember for later use; instead of it getting lost in the ether, if I save it to a history file, I can look it up later.
  • Sometimes I need an alibi for what I was doing at a particular time
  • I ran a bunch of commands in the console to perform an analysis not realizing that I would have to repeat this analysis later. I can retrieve the commands from a history file and put it into a script where it belongs.

These lines instruct R to, before anything else, echo a timestamp to the console and to my R history file. The timestamp greatly improves my ability to search through my history for relevant commands.

[Lines 36-42]: These lines instruct R, right before exiting, to write all commands I used in that session to my R command history file. I usually have this set in an environment variable called "R_HISTFILE" on most of my systems, but in case I don't have this defined, I write it to a file in my home directory called .Rhistory.

[Line 44]: Enables the colorized output from R (provided by the colorout package) on appropriate consoles.

[Lines 47-50]: This defines a function that loads a libary into the namespace without any warning or startup messages clobbering my console.

[Line 52]: I often want to autoload the 'dplyr' and 'ggplot2' packages (particularly 'dplyr' as it is now an integral part of my R experience).

[Lines 54-56]: This loads the packages in my "auto.loads" vector if the R session is interactive.

[Lines 58-59]: This creates a new hidden namespace that we can store some functions in. We need to do this in order for these functions to survive a call to "rm(list=ls())" which will remove everything in the current namespace. This is described wonderfully in this blog post.

[Lines 61-64]: This defines a simple function to remove any row names a data.frame might have. This was stolen from Stephen Turner (which was in turn stolen from plyr).

[Lines 66-70]: This defines a function to sanely undo a "factor()" call. This was stolen from Dason Kurkiewicz.

Warnings about .Rprofile
There are some compelling reasons to abstain from using an R configuration file at all. The most persuasive argument against using it is the portability issue: As you begin to rely more and more on shortcuts and options you define in your .Rprofile, your R scripts will depend on them more and more. If a script is then transferred to a friend or colleague, often it won't work; in the worst case scenario, it will run without error but produce wrong results.

There are several ways this pitfall can be avoided, though:

  • For R sessions/scripts that might be shared or used on systems without your .Rprofile, make sure to start the R interpreter with the --vanilla option, or add/change your shebang lines to "#!/usr/bin/Rscript --vanilla". The "--vanilla" option will tell R to ignore any configuration files. Writing scripts that will conform to a vanilla R startup environment is a great thing to do for portability.
  • Use your .Rprofile everywhere! This is a bit of an untenable solution because you can't put your .Rprofile everywhere. However, if you put your .Rprofile on GitHub, you can easily clone it on any system that needs it. You can find mine here.
  • Save your .Rprofile to another file name and, at the start of every R session where you want to use your custom configuration, manually source the file. This will behave just as it would if it were automatically sourced by R but removes the potential for the .Rprofile to be sourced when it is unwanted.

    A variation on this theme is to create a shell alias to use R with your special configuration. For example, adding a shell alias like this:

    alias aR="R_PROFILE_USER=~/.myR/aR.profile R"
    

    will make it so that when "R" is run, it will run as before (without special configuration). In order to have R startup and auto-source your configuration you now have to run "aR". When 'aR' is run, the shell temporarily assigns an environment variable that R will follow to source a config script. In this case, it will source a config file called aR.profile in a hidden .myR subdirectory of my home folder. This path can be changed to anything, though.

This is the solution I have settled on because it is very easy to live with and invalidates concerns about portability.

share this: Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

5 Responses

  1. Barry R September 18, 2014 / 3:01 am

    Nice, but that invisible environment is going to get unwieldy pretty soon. Two functions, fine, but the post describing the technique defines about a dozen, and all the code is inline in .Rprofile, the only documentation being a few comments. If you put these functions in a package then you can load it from .Rprofile on startup, document it using roxygen, and even use those functions in R sessions that you started with --vanilla.

    • [email protected] September 18, 2014 / 4:15 pm

      That's a really good point. I haven't made a package for my custom function yet, though, but that's a better solution. Thanks for the feedback!

  2. Janko Thyson September 18, 2014 / 5:09 am

    Really nice post!

    I can totally relate ;-) Up until a couple of weeks ago, .Rprofile and .Renviron used to be really vague and kind of "no... let's not ge there". Looking back, a great part of that can be attributed to either poor or very odd documentation on those aspects - unlike yours!

    Just as an idea: maybe feel like writing something about '.onLoad()' and '.onAttach()' as well? ;-P I'm starting to play around with these funcitions in order to ensure startup stuff that should stay closely bound to certain packages and therefore not go into my .Rprofile, but I still don't really get the implications of a) having two different functions for package startup and b) how to best make use of them.

    • [email protected] September 18, 2014 / 4:17 pm

      Haha, I haven't played around too much with .onLoad() or .onAttach() yet–it looks like a blog post about that is in order :)
      And thanks for the kind words about the post!

  3. Jean-Claude Arbaut December 23, 2015 / 4:47 am

    Very nice. I first read your post on Rbloggers, where there is a mistake: attach(.env) has to be put after the functions are defined. I didn't know it's a bug until I tried, actually. But I see the script has been corrected here. Sadly, it was not on Rbloggers. Here is the link:

    http://www.r-bloggers.com/fun-with-rprofile-and-customizing-r-startup/

Leave a Reply

Your email address will not be published. Required fields are marked *