Evolutionary biologists are increasingly using R for building, editing and visualizing phylogenetic trees. The reproducible code-based workflow and comprehensive array of tools available in packages such as ape, phangorn and phytools make R an ideal platform for phylogenetic analysis. Yet the many different tree formats are not well integrated, as pointed out in a recent post.
The standard data structure for phylogenies in R is the “phylo” object, a memory efficient, matrix-based tree representation. However, non-biologists have tended to use a tree structure called the “dendrogram”, which is a deeply nested list with node properties defined by various attributes stored at each level. While certainly not as memory efficient as the matrix-based format, dendrograms are versatile and intuitive to manipulate, and hence a large number of analytical and visualization functions exist for this object type. A good example is the dendextend package, which features an impressive range of options for editing dendrograms and plotting publication-quality trees.
...It’s easy to come to a conference and feel intimidated by the wealth of knowledge and expertise of other attendees. As Ellen Ullman, a software engineer and writer describes,
I was aware at all times that I had only islands of knowledge separated by darkness; that I was surrounded by chasms of not-knowing, into one of which I was certain to fall.
One of the best ways to start feeling less intimidated is to start talking to others. Ullman continues,
...Data == knowledge! Much of the data we use, whether it be from
government repositories, social media, GitHub, or e-commerce sites comes
from public-facing APIs. The quantity of data available is truly
staggering, but munging JSON output into a format that is easily
analyzable in R is an equally staggering undertaking. When JSON is
turned into an R object, it usually becomes a deeply nested list riddled
with missing values that is difficult to untangle into a tidy format.
Moreover, every API presents its own challenges; code you’ve written to
clean up data from GitHub isn’t necessarily going to work on Twitter
data, as each API spews data out in its own unique, headache-inducing
nested list structure. To ease and generalize this process, Amanda
Dobbyn proposed an
unconf18 project for a general API response tidier! Welcome roomba,
our first stab at easing the process of tidying nested lists!...
Part of rOpenSci’s mission is to create technical infrastructure in the form of carefully vetted R software tools that lower barriers to working with data sources on the web. Our open peer software review system for community-contributed tools is a key component of this. As the rOpenSci community grows and more package authors submit their work for peer review, we need to expand our editorial board to maintain a speedy process. As our recent post shows, package submissions have grown every year since we started this experiment, and we see no reason they will slow down!...
You can find members of the rOpenSci team at various meetings and workshops around the world. Come say ‘hi’, learn about how our software packages can enable your research, or about our process for open peer software review and onboarding, how you can get connected with the community or tell us how we can help you do open and reproducible research....