knitr: elegant, flexible, and fast dynamic report generation with R

# Sweave

## Transition from Sweave to knitr

### 2012-02-24

Before knitr 1.0, it was compatible with Sweave for easier transition from Sweave to knitr, but the compatibility was dropped since v1.0 for (much) easier maintenance of this package. If you have an Rnw document written for Sweave, the first step you can do is to call Sweave2knitr() on it, and knitr will automatically correct the syntax (mainly chunk options, e.g. results=hide should be results='hide', and eval=true should be eval=TRUE, etc).

library(knitr)
Sweave2knitr('old-document.Rnw') # you will get old-document-knitr.Rnw by default
# see ?Sweave2knitr for details


## New syntax for chunk options

By default, knitr uses a new syntax to parse chunk options (in chunk headers <<>>=), which is similar to R function arguments. This gives us much more power than the old Sweave syntax. You can use arbitrary objects in chunk options and make use of full power of R. Here is a trivial example of setting graphical parameters for base R graphics:

<<par-hook, cache = FALSE>>=
knit_hooks$set(pars = function(before, options, envir) { if (before) graphics::par(options$pars)
})
@

<<use-pars, pars = list(mar=c(4, 4, .1, .1), mgp=c(2, 1, 0))>>=
plot(mtcars[, 1:2])
@


First we have set a chunk hook named pars, which uses the chunk option pars to set par(); then we pass a list of parameters to the chunk option pars, which will be passed to the hook function and used before (if (before)) this chunk is evaluated. This enables us to hide the long and boring code to set graphical parameters in the output.

The chunk options can even make use of objects in a chunk dynamically. Here is another example of setting the caption for the figure environment:

<<setup, cache = FALSE>>=
opts_knit$set(eval.after = 'fig.cap') # evaluate fig.cap after the chunk @ <<t-test, fig.cap = paste("The P-value is", t.test(x)$p.value)>>=
x = rnorm(100)
boxplot(x)
@


By default, all chunk options are evaluated before a chunk is executed (e.g. the pars option in the first example), but we can also postpone the evaluation of some options by setting the package option eval.after. In this example, the figure caption is dynamically generated from the P-value of a t-test on the object x in the chunk.

Neither of the above cases is possible in Sweave: on one hand, it is impossible to write literal commas in chunk options because commas are reserved as separators for options; on the other, there are only three types of objects supported in Sweave options – logical, numeric and character values, and all of them should be scalars. The root reason is Sweave parses the options by text string operations such as strsplit(), and knitr treats these options as formal arguments of a function (see ?formals), so you can use any valid R expressions.

The new syntax is more consistent with R syntax, so you do not have to remember any new rules, e.g. in the old syntax, you must not quote character strings or write literal commas, but in the new syntax, you write character strings in exactly the same way as you do in R.

## Other compatibility issues with Sweave

Note most of the issues described in this section can be automatically solved by Sweave2knitr().

Some features of Sweave were dropped in knitr and some were changed, including:

• concordance was changed to support RStudio; if opts_knit$get('concordance') is TRUE, a file named input-concordance.tex will be written for PDF/Rnw synchronization and error navigation purposes • keep.source was merged into a more flexible option tidy • print was dropped: whether an R expression is going to be printed is consistent with your experience of using R (e.g., x <- 1 will not be printed, while 1:10 will; just imagine you are typing the commands in an R console/terminal); if you really want the output of an expression to be invisible, you may use the function invisible() • term was dropped (think term=TRUE) • strip.white has a slightly different meaning: in knitr, strip.white = TRUE only removes the white lines in the beginning or end of a source chunk; Sweave allows values true, all, and false, but knitr only allows TRUE and FALSE • prefix was dropped (think prefix=TRUE) • prefix.string was renamed to fig.path and it is always used for figure filenames • eps, pdf and all logical options for graphics devices were dropped: please use the new option dev, which is similar to grdevice in Sweave but has more than 20 predefined graphical devices (if that is not enough, please let me know, or set it in chunk options by yourself); in old ages, we may have to use both eps and pdf (Sweave uses eps=TRUE, pdf=TRUE), but nowadays there is usually no need to generate multiple image formats for the same plot; that said, knitr also supports multiple formats per plot – just set dev to be a character vector, e.g. dev=c('pdf', 'png') • fig was dropped; now use fig.keep: fig.keep='high' is equivalent to fig=TRUE and fig.keep='none' is the same as fig=FALSE • width, height were changed to fig.width and fig.height respectively • \SweaveOpts{} and \SweaveInput{} are deprecated; use opts_chunk$set() and the chunk option child respectively
• knitr does not allow duplicate chunk labels, and Sweave does; note duplicate labels can cause problems because figure filenames are based on chunk labels, and a latter chunk with the same label as a previous chunk can override the figures generated in the previous chunk

For logical options, only TRUE/FALSE/T/F are supported (the first two are recommended), and true/false will not work, e.g., eval=FALSE is OK, and eval=false is not (unless there is an R object named false which happens to take a logical value). Chunk reference using <<chunk-label>> is still available, and there are other approaches for reusing chunks, e.g., use the new option ref.label or the function run_chunk(); chunk references can be recursive (see the demo chunk reference).

Besides, the LaTeX style file Sweave.sty was dropped as well; it has brought too much confusion to users since it is shipped with R instead of LaTeX; knitr has built-in styles using standard LaTeX packages, and users are free to change them using hooks.

For inline R code as in \Sexpr{}, knitr will automatically format the results – too big or too small numbers will be written in scientific notations (e.g. $3.14 \times 10^{-5}$ instead of the less readable 3.14e-05; see output demo). We can also call purl() to extract R code, which is similar to Stangle().

Some answers to Sweave FAQ’s from vignette('Sweave', package='utils') are different in knitr:

• A.4 Empty figure chunks give LaTeX errors.
• empty figure chunks will not cause LaTeX errors in knitr because figures will not be generated at all, and it is safe to use fig.keep (close to fig=TRUE in Sweave) even if there are not plots in a chunk
• A.5 Why do R lattice graphics not work?
• lattice, ggplot2 and other grid-based graphics will work as expected in knitr
• A.7 Creating several figures from one figure chunk does not work.
• multiple figures per chunk will work as expected in knitr; no need to use any special tricks such as cat()
• A.9 How can I change the formatting of R input and output chunks?
• we can use output hooks to change the formatting of output, and there is complete freedom – the output does not have to rely on the fancyvrb package, and there is no need to hack at Sweave.sty either
• A.10 How can I change the line length of R input and output?
• the width option is set to 75 by default in knitr, since R’s default is often too wide (besides, the useFancyQuotes option is set to FALSE to avoid a common problem caused by multi-byte characters, and digits is set to 4 to avoid too many digits)
• A.13 Can I use Sweave for HTML files?
• knitr comes with a set of hooks to deal with HTML output (see minimal demos)

Note knitr does not check the encoding of the input document using the tricks in Sweave like using \usepackage[foo]{inputenc}. You have to specify the encoding explicitly if you are not using the native encoding in your system, e.g. knit('foo.Rnw', encoding = 'GBK').

## Compatibility with pgfSweave

dev='tikz' in knitr means tikz=TRUE in pgfSweave, and external=TRUE was implemented differently – the cache of tikz graphics is moved to the R level instead of relying on the LaTeX package tikz (if a tikz plot is externalized, knitr will try to compile it to PDF immediately and use \includegraphics{filename} to insert it into the output; in comparison, external=FALSE uses \input{filename.tikz}); this frees the users from the GNU make utility and understanding tikz externalization.

## Compatibility with cacheSweave

Cache was implemented in knitr with functions in base R (e.g., lazyLoad()) and does not rely on other add-on packages; for cached chunks, the results from the last run will be loaded and written into the output, and this is more consistent with the default behavior of R code (users may wonder why print(x) does not produce any output for cached chunks; plots in cached chunks will still be in the output as well). However, bear in mind that not all side-effects can be cached; see the cache page.

## Sweave as a subset of knitr

The design of knitr is highly modularized so that even if you want to go back to the Sweave style, you are always free to do so with a single function in your chunk:

<<setup, include=FALSE, cache=FALSE>>=
render_sweave()
@


This tells knitr that you miss the good old Sweave.sty and Sinput/Soutput environments, and knitr is ready to use them for your output. It is just a matter of output hooks, which is orthogonal to other steps in the whole process.