R packages are widely used in science, yet the code behind them often does not come under scrutiny. To address this lack, rOpenSci has been a pioneer in developing a peer review process for R packages. The goal of pkginspector
is to help that process by providing a means to better understand the internal structure of R packages. It offers tools to analyze and visualize the relationship among functions within a package, and to report whether or not functions’ interfaces are consistent. If you are reviewing an R package (maybe your own!), pkginspector
is for you.
Evolutionary biologists are increasingly using R for building, editing and visualizing phylogenetic trees. The reproducible code-based workflow and comprehensive array of tools available in packages such as ape, phangorn and phytools make R an ideal platform for phylogenetic analysis. Yet the many different tree formats are not well integrated, as pointed out in a recent post.
The standard data structure for phylogenies in R is the “phylo” object, a memory efficient, matrix-based tree representation. However, non-biologists have tended to use a tree structure called the “dendrogram”, which is a deeply nested list with node properties defined by various attributes stored at each level. While certainly not as memory efficient as the matrix-based format, dendrograms are versatile and intuitive to manipulate, and hence a large number of analytical and visualization functions exist for this object type. A good example is the dendextend package, which features an impressive range of options for editing dendrograms and plotting publication-quality trees.
...It’s easy to come to a conference and feel intimidated by the wealth of knowledge and expertise of other attendees. As Ellen Ullman, a software engineer and writer describes,
I was aware at all times that I had only islands of knowledge separated by darkness; that I was surrounded by chasms of not-knowing, into one of which I was certain to fall.
One of the best ways to start feeling less intimidated is to start talking to others. Ullman continues,
...Data == knowledge! Much of the data we use, whether it be from
government repositories, social media, GitHub, or e-commerce sites comes
from public-facing APIs. The quantity of data available is truly
staggering, but munging JSON output into a format that is easily
analyzable in R is an equally staggering undertaking. When JSON is
turned into an R object, it usually becomes a deeply nested list riddled
with missing values that is difficult to untangle into a tidy format.
Moreover, every API presents its own challenges; code you’ve written to
clean up data from GitHub isn’t necessarily going to work on Twitter
data, as each API spews data out in its own unique, headache-inducing
nested list structure. To ease and generalize this process, Amanda
Dobbyn proposed an
unconf18 project for a general API response tidier! Welcome roomba
,
our first stab at easing the process of tidying nested lists!...
Part of rOpenSci’s mission is to create technical infrastructure in the form of carefully vetted R software tools that lower barriers to working with data sources on the web. Our open peer software review system for community-contributed tools is a key component of this. As the rOpenSci community grows and more package authors submit their work for peer review, we need to expand our editorial board to maintain a speedy process. As our recent post shows, package submissions have grown every year since we started this experiment, and we see no reason they will slow down!...