New Tools for Reproducible Research with R

JJ Allaire and Yihui Xie

2012/06/14

Reproducible Research with R

Why Reproducible Research?

Prime Directive: Trustworthy Software

Those who receive the results of modern data analysis have limited opportunity to verify the results by direct observation. Users of the analysis have no option but to trust the analysis, and by extension the software that produced it.

This places an obligation on all creators of software to program in such a way that the computations can be understood and trusted. This obligation I label the Prime Directive.


Chambers, Software for Data Analysis: Programming with R

New Tools

Productivity Tools for Sweave / LaTeX

R Markdown

The markdown Package

This code produces an identical result to Knit HTML in RStudio (with no run-time dependency on RStudio):

   knit("foo.Rmd")
   markdownToHTML("foo.md")
   browseURL("foo.html")   

Enables use of R Markdown with any editor or IDE

Distributing R Markdown Documents

The knitr Package

Motivation (as a student and TA)

I do homework, I grade homework, and I saw this:

Principle 1: Beautiful output by default

Code highlighting

Code reformatting

## option tidy=FALSE
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}

Same code, reformatted:

## option tidy=TRUE
for (k in 1:10) {
  j = cos(sin(k) * k^2) + 3
  print(j - 5)
}

Principle 2: What you imagined is what you get

Principle 3: Focus on R programming

Principle 4: Be sustainable

Compare

> (x = 0)
[1] 0
> x = x + 1

to (default):

(x = 0)
## [1] 0
x = x + 1

You do not appreciate this unless you have been a homework grader.

Principle 5: I write the core, you can define the decoration

Output hooks

You can control how the source code, normal output, warnings, messages, errors and plots are written in the output document.

knit_hooks$set(source = function(x, options) {
    paste("\\begin{DearSource}", x, 
          "\\end{DearSource}", sep = "")
})

LaTeX, HTML, Markdown and reStructuredText have been supported, and it is straightforward to support other formats.

Principle 6: All your base are belong to us

Principle 7: Do not scare beginners

Principle 8: Open source is open

In theory you can use any language with knitr, e.g.

<<test-python, engine='python'>>=
x = 'hello, python world!'
print x
print x.split(' ')
@

Contributions needed!

Principle 9: Literate programming is programming

Programmable reports: Example 1

<<setup>>=
library(gridExtra)
g = tableGrob(head(iris, 4))
@

<<draw-table, fig.width=convertWidth(grobWidth(g), "in", value=TRUE), fig.height=convertHeight(grobHeight(g), "in", value=TRUE), dev='png', dpi=150>>=
grid.draw(g)
@

Programmable reports: Example 2

Chunk hooks are functions associated with code chunks.

knit_hooks$set(lord = function(before, options, envir) {
  library(twitteR)
  # Authentication with OAuth here, then
  if (!before) {
    msg = paste('I have finished the chunk',
                options$label, ', my Lord!')
    tweet(msg)
  }
})
# enable the chunk hook
opts_chunk$set(lord = TRUE)

Conclusions

IN CODE WE TRUST

Questions and comments?