Note: you are no longer recommended to use the pandoc()
function in knitr. Please try the rmarkdown package instead: http://rmarkdown.rstudio.com
The function pandoc()
in knitr (since version 1.2) was designed to
convert Markdown documents to other formats such as LaTeX/PDF, HTML and Word
(odt/docx). The main idea is to minimize the command-line call by wrapping
commands into a configuration file or embedded configurations. Normally we
call Pandoc via command line like this:
pandoc -s --mathjax --number-sections --bibliography=foo.bib \
-o output.html input.md
It is tedious to type the same command again and again. What pandoc()
does
is to execute the command like above in R via system()
, but read the
Pandoc arguments from a config file, so that we can write all arguments in
the file once, and simply call pandoc('input.md')
afterwards.
Please follow the instructions on the Pandoc website to install it.
Absolute beginners
If you have no experience using the command line, you can try this function
without any configurations. Write a Markdown file, say, foo.md
, and throw
it into pandoc()
:
library(knitr)
pandoc('foo.md', format='html') # HTML
pandoc('foo.md', format='latex') # LaTeX/PDF
pandoc('foo.md', format='docx') # MS Word
pandoc('foo.md', format='odt') # OpenDocument
But you often need some custom options like what we showed in the beginning. Now we explain how to pass such options to Pandoc.
A simple example
Suppose you want to convert Markdown to HTML with arguments in the first command line example, you can write a config file like this:
t: html
s:
mathjax:
number-sections:
bibliography: foo.bib
o: output.html
You can save it as foo.txt
and run
library(knitr)
pandoc('input.md', format='html', config='foo.txt')
Then knitr will parse this config file and turn it into pandoc
arguments. The empty options such as s
and mathjax
are turned to -s
and --mathjax
respectively, and those non-empty options like bibliography
and o
are converted to --bibliography=foo.bib
and -o output.html
respectively.
The config file
The config file is essentially a Debian Control File. Here are some rules:
- the option name and value are separated by
:
- an option can have a value of multiple lines but all the following lines have to be indented by white spaces
- blank lines are used to separate records (paragraphs)
The first rule is simple. For the second rule, consider bibliography
: when
there are multiple bibliography databases to be passed to Pandoc, we can
write the config file as
bibliography: paper1.bib
paper2.bib
paper3.bib
In this case, it is converted to --bibliography=paper1.bib --bibliography=paper2.bib --bibliography=paper3.bib
and passed to Pandoc.
For the third rule, it is useful when we define multiple output formats in
the config file; below is an example of two records for html
and latex
output:
t: html
s:
mathjax:
number-sections:
bibliography: foo.bib
o: output.html
t: latex
latex-engine: xelatex
s:
number-sections:
output: test.pdf
With this config file, we can call pandoc('input.md', format='latex', config='foo.txt')
and we will get a PDF file test.pdf
.
The name of the config file is obtained from getOption('config.pandoc')
by
default, which means you can set options(config.pandoc = 'path/to/your/config.file')
as a global option. If this option is not set,
the pandoc()
function will look for a file foo.pandoc
where foo
is the
base name of the input file, e.g. it looks for test.pandoc
if the input
file is test.md
. In other words, the config file has the same name as the
Markdown file except that it has a different extension.
Common options
Sometimes we want to share some a few common options across different output
formats. For instance, --number-sections
can be used for both PDF and HTML
output. The record that does not contain the t
tag is treated as common
options for all formats. Now we can rewrite the above config file as:
s:
number-sections:
t: html
mathjax:
bibliography: foo.bib
o: output.html
t: latex
latex-engine: xelatex
output: test.pdf
Note the s
and number-sections
are extracted to a separate record
without a t
tag.
Embedded configurations
We may want to make the Markdown file self-contained in the sense that the
configurations are embedded in it, so we do not need to rely on an external
config file. In this case, we can use a special comment <!--pandoc -->
in
the Markdown file.
<!--pandoc
t: html
s:
mathjax:
number-sections:
bibliography: foo.bib
o: output.html
-->
Now we can pass a single file to other people and they will be able to call
pandoc()
to convert it to the expected format.
If both the config file and embedded configurations are found, they will be combined as if they were from a single file.
Complete examples
See the example 084 (using an external config file) and 088 (using embedded configurations).