Thursday, November 29, 2018 From rOpenSci (https://ropensci.org/blog/2018/11/29/codereview/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
This post is a summary of our Community Call on “Code Review in the Lab, or … How do you review code that accompanies a research project or paper?” held October 16, 2018. Resources associated with the Call, including the video recording, slides, notes and a discussion thread have been posted.
Although there are increasing incentives and pressures for researchers to share code (even for projects that are not essentially computational), practices vary widely and standards are mostly non-existent. The practice of reviewing code then falls to researchers and research groups before publication. With that in mind, rOpenSci hosted a discussion thread and a community call to bring together different researchers for a conversation about current practices, and challenges in reviewing code in the lab.
On the call, we heard from Carl Boettiger (UC Berkeley/rOpenSci), Melanie Frazier & Julia Stewart Lowndes (National Center for Ecological Analysis and Synthesis), and Hao Ye (U. Florida/Weecology). Between the presentations discussion, and active commenting by many in the community, we learned how different groups approached the problem.
What did we learn? A great deal! The community uses code in many different ways in research, and practices often differed accordingly.
The conversation around code review frequently diverted to discussion of project structures and coding practices. This was related to a common theme we heard from participants: agreement around workflows and project organization is a minimum prerequisite for a review process.
While different research groups had different standards for what these practices are, according to different use-cases, all had minimum project standards to make code reviewable. These ranged from simply having an adequate README explaining the project structure and how to run it, to using R-markdown documents, to research compendia and containerized projects with automated testing and continuous integration.
A few themes held across project types. First, documentation was the highest priority - inadequately documented projects were much more difficult to review. Secondly, as projects go more complex, it was helpful to break them up or modularize them, spinning off re-usable code into standalone, reusable packages and separating data from analysis code. Thirdly, as projects became either more complex or became reusable tools rather than one-off analyses, the degree of stringency needed for documentation, automated testing, and automated tools like linters increases. These latter two points are necessary as large, complex projects are challenging to review as a whole. Complex projects were also more reviewable when they produced and communicated outputs from the intermediate steps in pipelines, rather than run as a single black-box.
There was also considerable interest and hope in projects like Binder and Code Ocean (which we learned more about from April Clyburne-Sherin), which provide a shared environment for reviewers to review code and, simultaneously, minimum project structure to which projects must adhere. As these emerge as standards for code review structures across labs, they influence practices within labs.
Another area all our presenters emphasized was the need to create a social environment conducive to code review, especially given the mixed backgrounds of most research teams. This involves agreeing on and communicating standards, building a knowledge base, and practicing a culture of feedback.
Standards, like the ones describing project structures, need to be set for the team and communicated. At NCEAS, the Ocean Health Index team has a standard operating procedure document that describes file organization and naming conventions for the large shared code repositories. This document is maintained along with the repository and referenced in READMEs. Having written standards and workflows are important for collaborating within the team, and also for onboarding new team members.
Most teams also had a mix of practices for building up a team knowledge base. This involved written resources like a team wiki, commonly shared practices from external resources like Software Carpentry, and regular team time for things like formal group learning on topics like abstraction and refactoring. Having the code from different projects open and shareable across the team is also important, so that there is a sufficient source of examples of previous practice.
Finally, participants emphasized the need for creating a culture where people know mistakes happen and they should expect them. When finding, reviewing, and fixing mistakes is part of the workflow, people are more comfortable discussing and learning from them.
While the concept of standards and standardized workflows was common to most lab groups, and increasingly formalized, the review process was not. Code review itself is frequently described as “more of a process than an event”, “rarely explicit”, “very ad hoc”, and takes place in a variety of formats: one-on-one meetings, comments on committed code, pair programming sessions. In some cases, there were more formal processes for specific projects, such as updates to the shared codebase of a major project.
Many presenters and commenters described the importance of using a checklist for review. Many in the discussion thread offered their own checklists, both formal and informal. These checklists vary a great deal - in many cases they cover general areas such as “Can one run the code?”, and “Is there documentation?” In cases where projects in a group run deal with similar challenges, they can be more specific, like, “Are NAs handled appropriately in joins?” or “Does the pipeline run on new and old versions of the data pulled from version control?”
Some areas cropped up as common challenges. For instance - to what extent should feedback on research code emphasize refactoring, or optimization? While it was generally agreed that refactoring is more important for more general, highly reused code, it is by no means clear how much so. For rapidly developing research code, how much should testing be emphasized? Another area where the right approach to reviewing was unclear was in long-running, computationally-intense workflows. For such projects, can a reviewer or an automatic test suite be expected to reproduce the results?
Finally, one useful note was that review in in-person meetings can make it more challenging to maintain a written record of changes compared to interactions via systems like GitHub. The team from NCEAS noted that they use GitHub issues as a note-taking system so that discussions would be recorded in the same place as on-line discussion, and Weecology used co-authored commits to record changes from pair-programming sessions.
How does your group review code that accompanies a research project or manuscript? Have you tried to implement any of these approaches after hearing about them here? Share your experience in the comments below.