The 501st Reminder About Reproducible Examples

Yihui Xie 2017-10-19

First things first. This post is not meant to complain. It is just to document one “miserable” aspect of a software developer’s (daily) life. I wrote about the “reproducible example paradox” last month, and said I probably had reminded users 500 times of providing a reproducible example.

Perhaps it was the 501st time this morning in rstudio/markdown#86.

I’m okay with constantly reminding users, and I totally have the patience for that. What makes me sad is that sometimes my reminders are simply ignored.

Below is how I started a (sort of) typical morning.

Last night I saw rstudio/markdown#86 (I’m a night owl), and I told the issue poster it was a wrong repo. I can understand that the two repositories markdown and rmarkdown can be confusing to users. This has happened a few times, and I’m fine with that.

I asked for a reproducible example when she1 files the issue to a different repo (no matter if it is rmarkdown or shiny). This morning when I started to triage rmarkdown issues, I saw rstudio/rmarkdown#1160. There was not a reproducible example.

I felt the issue might have been posted to the shiny repo, too. I went there, and indeed the same issue rstudio/shiny#1872 was there. Nowadays I no longer actively watch the shiny repo, so I won’t see an issue unless I go there and check it.

So it was cross-posted, but not mentioned.

My colleague Winston asked for a reproducible example, too (the 502nd time?). Then she provided an example. I started to investigate the issue with her example, and quickly found I had to correct the code in a few places before I was able to run it, such as the missing library(data.table) and library(ggplot2).

It seemed Winston did exactly the same thing. Finally both of us independently corrected her code, and came up with the actual reproducible example.

Then Winston worked harder to create another reproducible example, and filed the issue rstudio/rmarkdown#1162. When I saw this example, I tried to turn it into a pure Shiny app without R Markdown, the problem didn’t exist in the Shiny app.

Only at this point, I was sure it was probably rmarkdown’s fault, which means I should be responsible for it. After about half an hour, I put together the fix in rstudio/rmarkdown#1165.

So in the end, four GitHub issues (actually there was one more), one pull request, two users, and two developers in total were involved. In an ideal world, one GitHub issue plus one minimal reproducible example should suffice.

Why does this happen all the time? I think we can find very thoughtful explanations in Peter Baumgartner’s recent blog post “WHAT IS OBVIOUS AND FOR WHOM?”. The world views of users and developers/authors are just different. My conclusion, as pessimistic as it is, is that this problem does not and will not have a solution. When a problem does not have a solution, it is not a problem, but a condition (that you must accept). My pessimistic conclusion mainly comes from the fact that the number of users is significantly larger than developers. As Peter pointed out:

We are explaining the same thing over and over but – to different people, students learners, classes etc. Even if we explicitly mention common errors and misunderstanding: there are always some learners who will commit these mistakes.

Educating one (different) person at a time is just too slow. I don’t even think educating a hundred at a time helps. The best thing I could do is to try to be emotionless when I feel the negative energy is coming up to my mind, and say “please provide a minimal, self-contained, and reproducible example” like a robot. Well, in fact, this is one of the saved replies in my GitHub account, so the good thing is that I don’t have to type it every time.

dull cat

  1. I apologize if I got the gender wrong. ↩︎