Friday, September 8, 2017 From rOpenSci (https://ropensci.org/blog/2017/09/08/first-review-experiences/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
It all started January 26th this year when I signed up to volunteer as a reviewer for R packages submitted to rOpenSci. My main motivation for wanting to volunteer was to learn something new and to contribute to the R open source community. If you are wondering why the people behind rOpenSci are doing this, you can read How rOpenSci uses Code Review to Promote Reproducible Science.
Three months later I was contacted by Maëlle Salmon asking whether I was interested in
reviewing the R package patentsview
for rOpenSci. And yes, I
was! To be honest I was a little bit thrilled.
The packages are submitted for review to rOpenSci via an issue to their
GitHub repository and also the reviews happen there. So you can check out
all previous package submissions and reviews.
With all the information you
get from rOpenSci and also the help from the editor it is straightforward
to do the package review. Before I started I read the
reviewer guides (links below) and checked out a few of the existing
reviews. I installed the package patentsview
from GitHub and also
downloaded the source code so I could check out how it was implemented.
I started by testing core functionality of the package by running the examples that were mentioned in the README of the package. I think this is a good starting point because you get a feeling of what the author wants to achieve with the package. Later on I came up with my own queries (side note: this R package interacts with an API from which you can query patents). During the process I used to switch between writing queries like a normal user of the package would do and checking the code. When I saw something in the code that wasn’t quite clear to me or looked wrong I went back to writing new queries to check whether the behavior of the methods was as expected.
With this approach I was able to give feedback to the package author which led to the inclusion of an additional unit test, a helper function that makes the package easier to use, clarification of an error message and an improved documentation. You can find the review I did here.
There are several R packages that helped me get started with my review,
e.g. devtools
and
goodpractice
. These
packages can also help you when you start writing your own packages. An
example for a very useful method is devtools::spell_check()
, which
performs a spell check on the package description and on manual pages.
At the beginning I had an issue with goodpractice::gp()
but Maelle Salmon
(the editor) helped me resolve it.
In the rest of this article you can read what I gained personally from doing a review.
When people think about contributing to the open source community, the first thought is about creating a new R package or contributing to one of the major packages out there. But not everyone has the resources (e.g. time) to do so. You also don’t have awesome ideas every other day which can immediately be implemented into new R packages to be used by the community. Besides contributing with code there are also lots of other things than can be useful for other R users, for example writing blog posts about problems you solved, speaking at meetups or reviewing code to help improve it. What I like much about reviewing code is that people see things differently and have other experiences. As a reviewer, you see a new package from the user’s perspective which can be hard for the programmer themselves. Having someone else review your code helps finding things that are missing because they seem obvious to the package author or detect code pieces that require more testing. I had a great feeling when I finished the review, since I had helped improve an already amazing R package a little bit more.
When I write R code I usually try to do it in the best way possible.
Google’s R Style Guide
is a good start to get used to coding best practice in R and I also
enjoyed reading Programming Best Practices
Tidbits. So normally
when I think some piece of code can be improved (with respect to speed,
readability or memory usage) I check online whether I can find a
better solution. Often you just don’t think something can be
improved because you always did it in a certain way or the last time you
checked there was no better solution. This is when it helps to follow
other people’s code. I do this by reading their blogs, following many R
users on Twitter and checking their GitHub account. Reviewing an R
package also helped me a great deal with getting new ideas because I
checked each function a lot more carefully than when I read blog posts.
In my opinion, good code does not only use the best package for each
problem but also the small details are well implemented. One thing I
used to do wrong for a long time was filling of data.frames until I
found a better (much faster)
solution on stackoverflow.
And with respect to this you
can learn a lot from someone else’s code. What I found really cool in
the package I reviewed was the usage of small helper functions (see
utils.R).
Functions like paste0_stop
and paste0_message
make the rest of the
code a lot easier to read.
When reviewing an R package, you check the code like a really observant user. I noticed many things that you usually don’t care about when using an R package, like comments, how helpful the documentation and the examples are, and also how well unit tests cover the code. I think that reviewing a few good packages can prepare you very well for writing your own packages.
If I motivated you to become an rOpenSci reviewer, please sign up! Here is a list of useful things if you want to become an rOpenSci reviewer like me.
While writing this blog post I found a nice article about contributing to the tidyverse which is mostly also applicable to other R packages in my opinion.
If you are generally interested in either submitting or reviewing an R package, I would like to invite you to the Community Call on rOpenSci software review and onboarding.