Reproducible science using Rlogo

2019-11-19

On reproducibility

What is reproducibility in science?

ability to reproduce results by a peer
requires data, methods, and procedures
increasingly, science is supposed to be reproducible

Why does it not happen, in practice?

Some opinions on whether reproducibility is needed:

Ideally, yes but we don't have time for this.
If it gets published, yes.
If it gets published, yes; unless it is in PLoS One…
No need: I work on my own.
For others to copy us? You crazy?!
No way! We rigged the data, the method does not work, and we ran the analyses in Excel.

Main obstacles to reproducibility

lack of time: ultimately, reproducibility is faster
fear of plagiarism: low risks in practice
internal work, no need to share: almost never true
one good reason: lack of tools to facilitate reproducibility

You never work alone

Be nice to your future selves!

Two aspects of reproducibility using

implementing methods as packages
making transparent and reproducible analyses

eproducibility in practice

Literate programming

Let us change our traditional attitude to the construction of programs: instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.
(Donald E. Knuth, Literate Programming, 1984)

A data-centred approach to programming

Literate programming in

Current workflows use the following equation:

markdown (.md) + = Rmarkdown (.Rmd)

Example:
knitr::knit2html("foo.Rmd") \(\rightarrow\) foo.html
rmarkdown::render("foo.Rmd") \(\rightarrow\) foo.pdf
rmarkdown::render("foo.Rmd") \(\rightarrow\) foo.doc
...

Rmarkdown: chunks in markdown

```{r chunk-title, ...}
a <- rnorm(1000)
hist(a, col = terrain.colors(15), border = "white", main = "Normal distribution")
```

results in:

a <- rnorm(1000)
hist(a, col = terrain.colors(15), border = "white", main = "Normal distribution")

Formatting outputs

```{r another-chunk-title, ...}
[some R code here]
```

where ... are options for processing and formatting, e.g:

eval (TRUE/FALSE): evaluate code?
echo (TRUE/FALSE): show code input?
results ("markup"/"hide"/"asis"): show/format code output
message/warning/error: show messages, warnings, errors?
cache (TRUE/FALSE): cache analyses?

See http://yihui.name/knitr/options for details on all options.

One format, several outputs

rmarkdown can generate different types of documents:

standardised reports (html, pdf)
journal articles. using the rticles package (.pdf)
Tufte handouts (.pdf)
word documents (.doc)
slides for presentations (html, pdf)
…

See: http://rmarkdown.rstudio.com/gallery.html.

`rmarkdown`: toy example 1/2

Let us consider the file :


---
title: "A toy example of rmarkdown"
author: "John Snow"
date: "2019-11-19"
output: html_document
---

This is some nice R code:

```{r rnorm-example}
x <- rnorm(100)
x[1:6]
hist(x, col = "grey", border = "white")
```

`rmarkdown`: toy example 1/2

rmarkdown::render("foo.Rmd")

Good practices

`rmarkdown` is just the beginning

alter your original data

have a messy project

write non-portable code

write horrible code

lose work permanently

How to treat your original data

do not touch your original data
save it as read-only
make copies - you can play with these
track the changes made to the original data

How to avoid messy projects

1 project = 1 folder
subfolders for: data, analyses, figures, manuscripts, …
document the project using a README file
use the Rstudio projects (if you use Rstudio)

How to write portable code?

avoid absolute paths e.g.:
my_file <- "C:\project1\data\data.csv"
use the package here for portable paths e.g.:
my_file <- here("data/data.csv")
avoid special characters and spaces in all names e.g.:
éèçêäÏ*%~!?&
assume case sensitivity:
FooBar \(\neq\) foobar \(\neq\) FOOBAR

How to write better code?

name things explicitly

settle for one naming convention; snake_case is currently recommended for packages

document your code using comments (##)

write simple code, in short sections

use current coding standards – see the lintr package

Example of `lintr`

source: https://github.com/jimhester/lintr

Structuring analysis reports: question-driven report

organised by questions / analysis topics

pros: better narrative

cons: harder code to follow / review

Structuring analysis reports: code-driven report

organised by type of code

pros: easier to read / review code

cons: narrative harder to follow

Structuring analysis reports: hybrid report

differentiates infrastructure vs analysis code
makes question-specific code simple, and repetitive
pros: narrative and code easier to read
cons: harder to design (need frequent re-factoring)

Do not lose your work!

Because you never know what can happen..

How to avoid losing work?

never rely on a single computer to store your work
backups are good, syncing with a server is better (e.g. Dropbox)
use version numbers to track progress
use reportfactory for repeated analysis updates
use version control systems (e.g. GIT) for serious coding projects

Going further

check our golden rules for writing analysis reports
use report factory templates as starting points
use R4epis templates as starting points

On reproducibility

What is reproducibility in science?

Why does it not happen, in practice?

Main obstacles to reproducibility

You never work alone

Two aspects of reproducibility using

eproducibility in practice

Literate programming

A data-centred approach to programming

Literate programming in

Rmarkdown: chunks in markdown

Formatting outputs

One format, several outputs

rmarkdown: toy example 1/2

rmarkdown: toy example 1/2

Good practices

rmarkdown is just the beginning

How to treat your original data

How to avoid messy projects

How to write portable code?

How to write better code?

Example of lintr

Structuring analysis reports: question-driven report

Structuring analysis reports: code-driven report

Structuring analysis reports: hybrid report

Do not lose your work!

How to avoid losing work?

Going further

`rmarkdown`: toy example 1/2

`rmarkdown`: toy example 1/2

`rmarkdown` is just the beginning

Example of `lintr`