Thanks to the first post of the series we know where to observe birds near Radolfzell’s Max Planck Institute for Ornithology, so we could go and do that! Or we can stay behind our laptops and take advantage of eBird, a fantastic bird sightings aggregator! As explained by Matt Strimas-Mackey in his recent blog post, “The eBird database currently contains over 500 million records of bird sightings, spanning every country and over 98% of species, making it an extremely valuable resource for bird research and conservation.”....
This post is the 1st post of a series showcasing various rOpenSci packages as if Maëlle were a birder trying to make the most of R in general and rOpenSci in particular. Although the series use cases will mostly feature birds, it’ll be the occasion to highlight rOpenSci’s packages that are more widely applicable, so read on no matter what your field is! Moreoever, each post should stand on its own....
This week version 2.0 of the mongolite package has been released to CRAN. Major new features in this release include support for MongoDB 4.0, GridFS, running database commands, and connection pooling.
Mongolite is primarily an easy-to-use client to get data in and out of MongoDB. However it supports increasingly many advanced features like aggregation, indexing, map-reduce, streaming, encryption, and enterprise authentication. The mongolite user manual provides a great introduction with details and worked examples.
...In this technote I will outline what phylotaR was developed for, how to install it and how to run it with some simple examples.
What is phylotaR?
In any phylogenetic analysis it is important to identify sequences that share the same orthology – homologous sequences separated by speciation events. This is often performed by simply searching an online sequence repository using sequence labels. Relying solely on sequence labels, however, can miss sequences that have either not been labelled, have unanticipated names or have been mislabelled.
...eBird is an online tool for recording bird
observations. The eBird database currently contains over 500 million
records of bird sightings, spanning every country and nearly every bird
species, making it an extremely valuable resource for bird research and
conservation. These data can be used to map the distribution and
abundance of species, and assess how species’ ranges are changing over
time. This dataset is available for download as a text file; however,
this file is huge (over 180 GB!) and, therefore, poses some unique
challenges. In particular, it isn’t possible to import and manipulate
the full dataset in R. Working with these data typically requires
filtering them to a smaller subset of desired observations before
reading into R. This filtering is most efficiently done using AWK, a
Unix utility and programming language for processing column formatted
text data. The auk
package acts as a front end for AWK, allowing users
to filter eBird data before import into R, and provides tools to perform
some important pre-processing of the data. Them name of this package
comes from the happy coincidence that the command line tool
AWK, upon which
the package is based, is pronounced the same as
auk, the family of sea birds also
known as Alcids.