phylotaR: Retrieve Orthologous Sequences from GenBank

August 8, 2018

By: Dom Bennett

In this technote I will outline what phylotaR was developed for, how to install it and how to run it with some simple examples. What is phylotaR? In any phylogenetic analysis it is important to identify sequences that share the same orthology – homologous sequences separated by speciation events. This is often performed by simply searching an online sequence repository using sequence labels. Relying solely on sequence labels, however, can miss sequences that have either not been labelled, have unanticipated names or have been mislabelled.

Extracting and Processing eBird Data

August 7, 2018

By: Matthew Strimas-Mackey

eBird is an online tool for recording bird observations. The eBird database currently contains over 500 million records of bird sightings, spanning every country and nearly every bird species, making it an extremely valuable resource for bird research and conservation. These data can be used to map the distribution and abundance of species, and assess how species’ ranges are changing over time. This dataset is available for download as a text file; however, this file is huge (over 180 GB!

A package for dimensionality reduction of large data

August 1, 2018

By: Sean Hughes  |  Angela Li  |  Ju Kim  |  Malisa Smith  |  Ted Laderas

Motivation Note: Recently, two new UMAP R packages have appeared. These new packages provide more features than umapr does and they are more actively developed. These packages are: umap, which provides the same Python wrapping function as umapr and also an R implementation, removing the need for the Python version to be installed. It is available on CRAN. uwot, which also provides an R implementation, removing the need for the Python version to be installed.

phylogram: dendrograms for evolutionary analysis

July 12, 2018

By: Shaun Wilkinson

Evolutionary biologists are increasingly using R for building, editing and visualizing phylogenetic trees. The reproducible code-based workflow and comprehensive array of tools available in packages such as ape, phangorn and phytools make R an ideal platform for phylogenetic analysis. Yet the many different tree formats are not well integrated, as pointed out in a recent post. The standard data structure for phylogenies in R is the “phylo” object, a memory efficient, matrix-based tree representation.

A package for tidying nested lists

June 26, 2018

By: Amanda Dobbyn  |  Jim Hester  |  Laura DeCicco  |  Christine Stawitz  |  Isabella Velasquez

Data == knowledge! Much of the data we use, whether it be from government repositories, social media, GitHub, or e-commerce sites comes from public-facing APIs. The quantity of data available is truly staggering, but munging JSON output into a format that is easily analyzable in R is an equally staggering undertaking. When JSON is turned into an R object, it usually becomes a deeply nested list riddled with missing values that is difficult to untangle into a tidy format.

