Overview of reproducible research
Reproducible research is a phrase that describes an academic paper or manuscript that contains the code and data in addition to what is usually published - the researcher’s interpretation. In doing so, the experimental design and method of analysis is easily replicated by unaffiliated labs and critiqued by reviewers as the full analysis used to produce the results is submitted along with the final paper. One way of producing reproducible research is to use R code directly inside your LaTeX document. In order to faciliate the combination of statistical code and manuscript writing, two R packages in particular have arisen: Sweave and knitr. knitr is an R package designed as a replacement for Sweave, but both packages combine your R analysis with your LaTeX manuscript (i.e., knitr = R + LaTeX).
One advantage of knitr is that the researcher can easily create ANOVA and demographic tables directly from the data without messing around in Excel. However, as we’ll see, both knitr and Sweave can run into problems when formatting your table values to 2 decimal points. In this post, I’ll detail my proposed method of fixing that which can be applied to your entire mansucript by editing the beginning of your knitr preamble.
The basic example below contains the beginning of a hypothetical Methods section of a manuscript. We want to take the values from an R table, which has the breakdown of participants by gender and ethnicity, and display them as numbers in our manuscript.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
As we see below, running the knit()
command on our knitr manuscript inside R produces a regular LaTeX file that can be compiled with to a PDF using pdflatex or TeX Shop. Notice that the R table objects have been replaced with LaTeX tables.
1 2 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
Last, after compiling our LaTeX file using TeX Shop, we’re greeted with the final product below:
Summary thus far
The example above used data from R directly in a sentence in the Methods section (i.e., “We recruited 200 university undergraduates from an introductory psychology class.”) and did so using the \Sexpr{}
command in the knitr manuscript (i.e., knitr.Rnw). The \Sexpr{}
command contained an R
expression to calculate the total number participants. This expression was evaluated and converted to LaTeX code when we ran the knit()
function on the .Rnw file, which produces a .tex document. The .tex document contained no R code and was therefore ready to be compiled to a PDF using TeX Shop or pdf2latex in Terminal.app.
Forcing knitr to round to 2 decimal places
The default behavior of knitr works well most of the time. However, what if we didn’t have whole numbers in our data table? What if we had percentages that we wanted to round down to 2 digits, as required by many journals? For example, the value \Sexpr{pi}
would be evaluated and replaced with 3.141593 in the LaTeX file. One common problem, and part of Yihui’s motivation for replacing Sweave with knitr
, is that \Sexpr{}
doesn’t automatically round digits.
In Sweave (i.e., knitr’s predecessor), each value of pi would have to be encased in round(pi,2)
. Thus, we end up with \Sexpr{round(pi,2)}
. Yihui fixed this problem by automatically rounding digits, the length of which is set with options(digits=2)
in the knitr preamble in your .Rnw document. See below:
1 2 3 4 |
|
The default rounding behavior of knitr works well until a value contains a 0 after rounding, such as 123.10. Running the expression round(123.10,2)
outputs 123.1. In this case, every other value in the manuscript table would be aligned at the decimal place except for the unlucky value - sticking out like a sore thumb. To fix this, you could use sprintf("%.2f", pi)
every time you have to call \Sexpr{}
in the manuscript - but then what’s the advantage of using knitr? This hack unnecessarily complicates the manuscript and distracts from the writing process.
Modify the default inline_hook for knitr
After seeing a StackOverflow answer by Josh O’Brien, I realized that the default inline_hook function for knitr could be easily modified to use the sprintf()
command instead of round()
. The minute change will forcibly output all manuscript values to 2 decimal places. Below, we see the default behavior for knitr when processing inline R expressions:
1 2 |
|
1 2 3 4 5 6 |
|
Note: My original code for this post used the format()
command. Winston Chang pointed out that this could lead to unreliable output and tweaked the code to use sprintf()
. The credit for the more efficient function below goes to him. Below, we add out improved inline_hook
to the preamble of our knitr document:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Working Example
Let’s put it all together! The following is a working example of the the suggested knitr inline_hook function, which should give more reliable output by rounding inline values to 2 decimal places.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|