rOpenSci Blog

High Performance CommonMark and Github Markdown Rendering in R

Jeroen Ooms — December 2, 2016
This week the folks at Github have open sourced their fork of libcmark (based on the extensive PR by Mathieu Duponchelle), which they use to render markdown text within documents, issues, comments and anything else on the Github website. The new release of the commonmark R package incorporates this library so that we can take advantage of Github quality markdown rendering in R. The most exciting change is that the library has gained an extension...

The rOpenSci geospatial suite

Scott Chamberlain — November 22, 2016
Geospatial data - data embedded in a spatial context - is used across disciplines, whether it be history, biology, business, tech, public health, etc. Along with community contributors, we're working on a suite of tools to make working with spatial data in R as easy as possible. If you're not familiar with geospatial tools, it's helpful to see what people do with them in the real world. Example 1 One of our geospatial packages, geonames,...

The new Tesseract package: High Quality OCR in R

Jeroen Ooms — November 16, 2016
Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form. People looking to extract text and metadata from pdf files in R should try...

Chat with the rOpenSci team at upcoming meetings

Stefanie Butland — November 9, 2016
You can find members of the rOpenSci team at various meetings and workshops around the world. Come say 'hi', learn about how our packages can enable your research, or about our onboarding process for contributing new packages, discuss software sustainability or tell us how we can help you do open and reproducible research. Where's rOpenSci? November 2016 to February 2017 When Who Where What Nov 13–18, 2016 Dan Katz Salt Lake City, US SC16 Nov...

Community Call v12 - How do I create a code of conduct for my event/lab/codebase?

Stefanie Butland — October 31, 2016
In order to facilitate a transformation towards open and reproducible research, rOpenSci is building and improving not only the technical infrastructure, but the social infrastructure as well. To support this, occasionally a Community Call will focus on a topic that reflects the values of rOpenSci. The first of these, on Thursday, December 15th, 8-9 AM PST, will be on "How do I create a code of conduct for my event/lab/codebase?". Agenda Welcome (5 min, Stefanie...

Greetings from Your Community Manager!

Stefanie Butland — October 12, 2016
I feel both proud and privileged to join rOpenSci as your Community Manager. I’ve been a compulsive community builder since the early 2000’s, but it has rarely been part of my job description. Now it seems like all roads have led to this. After a couple of fine days of indoctrination at the UC Berkeley home of rOpenSci, I’m settled into work in beautiful Kamloops, British Columbia, Canada. So much of my perspective of rOpenSci...

Postdoctoral Scholar – Sustainable Software and Reproducible Research

Karthik Ram — September 7, 2016
The rOpenSci project based at the University of California, Berkeley seeks to hire a postdoctoral scholar to work on the research activities funded by the grant titled “Fostering the next generation of sustainable software and reproducible research practices in the scientific community”. The project develops open source software to promote reproducible research practices in the scientific community. The postdoctoral scholar will focus on a research topic aligned with their own interests in order to better...

Advanced Image-Processing in R with Magick, Part I

Jeroen Ooms — August 23, 2016
The new magick package is an ambitious effort to modernize and simplify high-quality image processing in R. It wraps the ImageMagick STL which is perhaps the most comprehensive open-source image processing library available today. The ImageMagick library has an overwhelming amount of functionality. The current version of Magick exposes a decent chunk of it, but being a first release, documentation is still sparse. This post briefly introduces the most important concepts to get started. There...

New package tokenizers joins rOpenSci

Lincoln Mullen — August 23, 2016
The R package ecosystem for natural language processing has been flourishing in recent days. R packages for text analysis have usually been based on the classes provided by the NLP or tm packages. Many of them depend on Java. But recently there have been a number of new packages for text analysis in R, most notably text2vec, quanteda, and tidytext. These packages are built on top of Rcpp instead of rJava, which makes them much...

rotl paper published

Francois Michonneau, Joseph Brown, David Winter — July 26, 2016
We are excited to announce a paper describing rotl, our package for the Open Tree of Life data, has been published. The full citation is: Michonneau, F., Brown, J. W. and Winter, D. J. (2016), rotl: an R package to interact with the Open Tree of Life data. Methods Ecol Evol. doi: https://doi.org/10.1111/2041-210X.12593 The paper, which is freely available, describes the package and the data it wraps in detail. Rather than rehash the information here,...

Testing packages with R Travis for OS-X

Jeroen Ooms — July 12, 2016
Travis is a continuous integration service which allows for running automated testing code everytime you push to GitHub. Hadley's book about R packages explains how and why R package authors should take advantage of this in their development process. The build matrix Travis is now providing support for multiple operating systems, including Ubuntu 14.04 (Trusty) and various flavors of Mac OS-X. Jim Hester has done a great job of tweaking the travis R-language build script...

Australia Unconference

Jessie Roberts, Miles McBain, Nicholas Tierney — June 16, 2016
On April 21st and 22nd of 2016, we had 40 members of the R community gather in Brisbane, Australia, with the goal of reproducing the rOpensci Unconference events that have been running with great success in San Francisco since 2014. Like every event organisers ever, we went through the usual crisis: Where will it be? Will anyone actually show up? Is the problem space over venue, date, attendees, catering, sponsors convex? It it even possible...

Software sustainability research with rOpenSci

Daniel S. Katz — May 25, 2016
I’m happy to announce that I’ve started a project with rOpenSci under their recent award from the Helmsley Foundation. My work with rOpenSci will focus on sustainability of the project itself. Sustainability can be defined as having the resources to do the necessary work to continue and grow rOpenSci. This is one of the most difficult challenges for rOpenSci and for many other research software projects. rOpenSci has a very broad and very ambitious goal,...

Onboarding at rOpenSci: A Year in Reviews

Noam Ross, Carl Boettiger, Jenny Bryan, Scott Chamberlain, Rich FitzJohn, Karthik Ram — March 28, 2016
Code review, in which peers manually inspect the source code of software written by others, is widely recognized as one of the best tools for finding bugs in software. Code review is relatively uncommon in scientific software development, though. Scientists, despite being familiar with the process of peer review, often have little exposure to code review due to lack of training and historically little incentive to share the source code from their research. So scientific...

rOpenSci geospatial libraries

Scott Chamberlain — March 17, 2016
Geospatial data input/output, manipulation, and vizualization are tasks that are common to many disciplines. Thus, we're keenly interested in making great tools in this space. We have an increasing set of spatial tools, each of which we'll cover sparingly. See the cran and github badges for more information. We are not trying to replace the current R geospatial libraries - rather, we're trying to fill in gaps and create smaller tools to make it easy...

We're hiring a community manager!

Core Team — March 10, 2016
The rOpenSci team is growing, thanks in part to our recent funding. We recently welcomed Jeroen Ooms on the software development side and today we're thrilled to announce a position for community manager. Our mission is to expand access to scientific data and promote a culture of reproducible research and sustainable research software. We aim to cultivate a vibrant and open community through activities such as our community calls, discussion forums, package review, and annual...

Australian rOpenSci Unconference

Nicholas Tierney — March 9, 2016
The rOpenSci Unconference is coming to Australia and we are excited!! The event will take place in sunny Brisbane, on April 21-22 2016 hosted at the Microsoft Innovation Centre. You can find more information about the event and how to register at http://auunconf.ropensci.org/. I was completely and unceremoniously thrown into the deep end when I first started learning R. Contrary to what I initially thought possible, I am now irreversibly converted to the ideology of...

Introducing pdftools - A fast and portable PDF extractor

Jeroen Ooms — March 1, 2016
Scientific articles are typically locked away in PDF format, a format designed primarily for printing but not so great for searching or indexing. The new pdftools package allows for extracting text and metadata from pdf files in R. From the extracted plain-text one could find articles discussing a particular drug or species name, without having to rely on publishers providing metadata, or pay-walled search engines. The pdftools slightly overlaps with the Rpoppler package by Kurt...

Help us prioritize what to build in 2016

Karthik Ram — January 7, 2016
We've got a big year ahead of us as we work towards expanding our team and organizing various events and activities. We remain committed to supporting and expanding the landscape of open source tools that are available to researchers. While much of our focus has been around making it easier to access various data repositories, we are keen on improving other parts of the research pipeline, including data munging, documentation and sharing. To help us...

rOpenSci Announces $2.9M Award from the Helmsley Charitable Trust

Karthik Ram — November 19, 2015
rOpenSci, whose mission is to develop and maintain sustainable software tools that allow researchers to access, visualize, document, and publish open data on the Web, is pleased to announce that it has been awarded a grant of nearly $2.9 million over three years from The Leona M. and Harry B. Helmsley Charitable Trust. The grant, which was awarded through the Trust’s Biomedical Research Infrastructure Program, will be used to expand rOpenSci’s mission of developing tools...

Rentrez 1_0 released

David Winter — September 24, 2015
A new version of rentrez, our package for the NCBI's EUtils API, is making it's way around the CRAN mirrors. This release represents a substantial improvement to rentrez, including a new vignette that documents the whole package. This posts describes some of the new things in rentrez, and gives us a chance to thank some of the people that have contributed to this package's development. Thanks Thanks to everyone who has filed and issue or...

A drat repository for rOpenSci

Carl Boettiger — August 4, 2015
We're happy to announce the launch of a CRAN-style repository for rOpenSci at http://packages.ropensci.org This repository contains the latest nightly builds from the master branch of all rOpenSci packages currently on GitHub. This allows users to install development versions of our software without specialized functions such as install_github(), allows dependencies not hosted on CRAN to still be resolved automatically, and permits the use of update.packages(). Using the repository To use, simply add packages.ropensci.org to your...

The challenge of combining 176 x #otherpeoplesdata to create the Biomass And Allometry Database

Daniel Falster, Rich FitzJohn, Remko Duursma, Diego Barneche — June 3, 2015
Despite the hype around "big data", a more immediate problem facing many scientific analyses is that large-scale databases must be assembled from a collection of small independent and heterogeneous fragments -- the outputs of many and isolated scientific studies conducted around the globe. Collecting and compiling these fragments is challenging at both political and technical levels. The political challenge is to manage the carrots and sticks needed to promote sharing of data within the scientific...

Database interfaces

Scott Chamberlain — May 20, 2015
There are many different databases. The most familiar are row-column SQL databases like MySQL, SQLite, or PostgreSQL. Another type of database is the key-value store, which as a concept is very simple: you save a value specified by a key, and you can retrieve a value by its key. One more type is the document database, which instead of storing rows and columns, stores blobs of text or even binary files. The key-value and document...

Introducing a Wishlist for Scientific R Packages

Oliver Keyes — March 10, 2015
There are two things that make R such a wonderful programming environment - the vast number of packages to access, process and interpret data, and the enthusiastic individuals and subcommunities (of which rOpenSci is a great example). One, of course, flows from the other: R programmers write R packages to provide language users with more features, which makes everyone's jobs easier and (hopefully!) attracts more users and more contributions. But what if you have an...

Curling - exploring web request options

Scott Chamberlain — December 18, 2014
rOpenSci specializes in creating R libraries for accessing data resources on the web from R. Most times you request data from the web in R with our packages, you should have no problem. However, you evenutally will run into problems. In addition, there are advanced things you can do modifying requests to web resources that fall in the advanced stuff category. Underlying almost all of our packages are requests to web resources served over the...

Community calls

Scott Chamberlain — December 15, 2014
Key to the success of rOpenSci is our community and we want to hear more regularly from our members, and foster new interactions among the group. In addition, community calls are a way for us to give important updates, and get feedback on them. We tentatively plan on doing community calls once per month. The format of rOpenSci community calls could be of various types. We could have community members show off software they've been...

Growth of open data in biology

Scott Chamberlain — November 10, 2014
Why open data growth At rOpenSci we try to make it easier for people to use open data and contribute open data to the community. The question often arises: How much open data do we have? Another angle on this topic is: How much is open data growing? We provide access to dozens of data respositories through our various packages. We asked many of them to share numbers on the amount of data they have,...

Introducing Rocker: Docker for R

Carl Boettiger, Dirk Eddelbuettel — October 23, 2014
You only know two things about Docker. First, it uses Linux containers. Second, the Internet won't shut up about it. -- attributed to Solomon Hykes, Docker CEO So what is Docker? Docker is a relatively new open source application and service, which is seeing interest across a number of areas. It uses recent Linux kernel features (containers, namespaces) to shield processes. While its use (superficially) resembles that of virtual machines, it is much more lightweight...

New fiscal sponsorship agreement with NumFocus foundation

Karthik Ram — October 1, 2014
I’m very pleased to announce that rOpenSci has signed a comprehensive fiscal sponsorship agreement with the NumFocus foundation, a 501(c)3 nonprofit that supports R&D for open source scientific software projects. We are delighted to be in the company of esteemed projects such as IPython and Julia that share our goal of promoting reproducible research practices across many scientific communities and developing a rich ecosystem of tools for open scientific computing. All of our activities, from...

NCEAS Codefest Follow-up

Scott Chamberlain, Ted Hart — September 23, 2014
The week after labor day, we had the pleasure of attending the NCEAS open science codefest event in Santa Barbara. It was great to meet folks like the new arrivals at the expanding Mozilla Science Lab, Bill Mills and Abby Cabunoc (Bill even already has a great post up about the codefest), and see old friends from NCEAS and DataONE, among many more. This 2.5 day event ran smoothly thanks to the leadership of Matt...

rOpenSci at NESCent Open Tree of Life Hackathon

David Winter — August 15, 2014
The Open Tree of Life project aims to synthesize our combined knowledge of how organisms relate to each other, and make the results available to anyone who wants to use them. At present, the project contains data from more than 4,000 published phylogenies, which combine with other data sources to make a tree that covers 2.5 million species. In September, the Open Tree of Life team are holding a hackathon to develop tools that use...

Announcing our ambassadors program

Karthik Ram — August 11, 2014
In the last 12 months we traveled all over the world delivering talks and hands on workshops at various conferences and universities. This was a great opportunity for us to raise awareness for the project and get more of you involved as contributors and collaborators. As we scale the project to the next level, we need your help in spreading the message. Today we would like to officially announce the rOpenSci Ambassadors program. To facilitate...

Community conversations and a new package for full text

Scott Chamberlain, Karthik Ram — August 8, 2014
UPDATE: Use the new discussion forum at http://discuss.ropensci.org/ Community Community is at the heart of rOpenSci. We couldn't have accomplished most of our work without help from various contributors and users. Most of our discussions with the broader community over the past year have been through twitter or one-on-one conversations. However, we would like to foster more open ended and deeper discussions with our community. To this end, we are resurrecting our public Google group...

NCEAS Codefest

Scott Chamberlain — August 6, 2014
We're delighted to be sponsoring the upcoming Open Science Codefest in Santa Barbara, California, alongside RENCI, NCEAS, NSF, DataONE, and Mozilla Science Lab. The Open Science Codefest's goal is to gather researchers from across ecology, biodiversity science, and other earth and environmental sciences with programmer types to collaborate on coding projects. The ideas for the event so far include not just coding projects with the end result being software, but conversations on particular topics that...

Changes in rnoaa v0.2.0

Scott Chamberlain — July 21, 2014
We just released v0.2 of rnoaa. For details on the update, see the release notes. What follows are some notes on the more important changes. Updating to v0.2 Install rnoaa from CRAN install.packages("rnoaa") or Github devtools::install_github("ropensci/rnoaa") Then load rnoaa library("rnoaa") UI changes We changed almost all function names to have a more intuitive programmatic user interface (or UI). We changed all noaa*() functions to ncdc*() - these work only with NOAA National Climatic Data Center...

rOpenSci awarded $300k from the Sloan Foundation

Karthik Ram — June 10, 2014
We're delighted to announce that we have received additional funding from the Sloan Foundation to continue and expand our efforts from the past year. We're grateful for the overwhelming support from the community, especially through engagement at various events we organized and attended this past year. Over the next year we plan to: advance not only the technical infrastructure for accessing, managing, and synthesizing large and heterogeneous data, but also the social infrastructure of research...

Reproducible research is still a challenge

Rich FitzJohn, Matt Pennell, Amy Zanne, Will Cornwell — June 9, 2014
Science is reportedly in the middle of a reproducibility crisis. Reproducibility seems laudable and is frequently called for (e.g., nature and science). In general the argument is that research that can be independently reproduced is more reliable than research that cannot be independently reproduced. It is also worth noting that reproducing research is not solely a checking process, and it can provide useful jumping-off points for future research questions. It is difficult to find a...

taxize v0.3.0 update - a new data source, taxonomy in writing, and uBio examples

Scott Chamberlain — May 20, 2014
We just released v0.3 of taxize. For details on the update, see the release notes. Some new features New function iplant_resolve() to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in the tnrs() function. New function ipni_search() to search for names in the International Plant Names Index (IPNI). See below for more. New function resolve() that unifies name resolution services from iPlant's name resolution service...

rOpenHack report

Karthik Ram — May 14, 2014
The rOpenSci project is a poster child for the fluid collaboration that has become increasingly common these days thanks to platforms like Twitter and GitHub. It has been really inspring to see open discussions take shape as rough ideas, which rapidly turn into prototype research software, all of which are now happening in the order of few days to weeks rather than months to years. The origins of this project itself lead back to a...

Overlaying species occurrence data with climate data

Ted Hart — April 22, 2014
One of the goals of the rOpenSci is to facilitate interoperability between different data sources around web with our tools. We can achieve this by providing functionality within our packages that converts data coming down via web APIs in one format (often a provider specific schema) into a standard format. The new version of rWBclimate that we just posted to CRAN does just that. In an earlier post I wrote about how users could combine...

Make your ggplots shareable, collaborative, and with D3

Matt Sundquist — April 17, 2014
Editor's note: This is a guest post by Matt Sundquist from Plot.ly. You can access the source code for this post at https://gist.github.com/sckott/10991885 Ggplotly and Plotly's R API let you make ggplot2 plots, add py$ggplotly(), and make your plots interactive, online, and drawn with D3. Let's make some. 1. Getting Started and Examples Here is Fisher's iris data. library("ggplot2") ggiris <- qplot(Petal.Width, Sepal.Length, data = iris, color = Species) print(ggiris) Let's make it in Plotly....

Topic Modeling In R

Carson Sievert — April 16, 2014
Editor's note: This is the first in a series of posts from rOpenSci's recent hackathon. I recently had the pleasure of participating in rOpenSci's hackathon. To be honest, I was quite nervous to work among such notables, but I immediately felt welcome thanks to a warm and personable group. Alyssa Frazee has a great post summarizing the event, so check that out if you haven't already. Once again, many thanks to rOpenSci for making it...

The ins and outs of interacting with web APIs

Core Team — April 14, 2014
We've received a number of questions from our users about dealing with the finer details of data sources on the web. Whether you're reading data from local storage such as a csv file, a .Rdata store, or possibly a proprietary file format, you've most likely run into some issues in the past. Common problems include passing incorrect paths, files being too big for memory, or requiring several packages to read files in incompatible formats. Reading...

Accessing iNaturalist data

Ted Hart — March 26, 2014
The iNaturalist project is a really cool way to both engage people in citizen science and collect species occurrence data. The premise is pretty simple, users download an app for their smartphone, and then can easily geo reference any specimen they see, uploading it to the iNaturalist website. It let's users turn casual observations into meaningful crowdsourced species occurrence data. They also provide a nice robust API to access almost all of their data. We've...

Species occurrence data

March 17, 2014
UPDATE: mapping functions are in a separate package now (mapr). Examples that do mapping below have been updated. The rOpenSci projects aims to provide programmatic access to scientific data repositories on the web. A vast majority of the packages in our current suite retrieve some form of biodiversity or taxonomic data. Since several of these datasets have been georeferenced, it provides numerous opportunities for visualizing species distributions, building species distribution maps, and for using it...

rnoaa - Access to NOAA National Climatic Data Center data

Scott Chamberlain — March 13, 2014
We recently pushed the first version of rnoaa to CRAN - version 0.1. NOAA has a lot of data, some of which is provided via the National Climatic Data Center, or NCDC. NOAA has provided access to NCDC climate data via a RESTful API - which is great because people like us can create clients for different programming languages to access their data programatically. If you are so inclined to write a bit of R...

dvn - Sharing Reproducible Research from R

Thomas Leeper — February 20, 2014
Reproducible research involves the careful, annotated preservation of data, analysis code, and associated files, such that statistical procedures, output, and published results can be directly and fully replicated. As the push for reproducible research has grown, the R community has responded with an increasingly large set of tools for engaging in reproducible research practices (see, for example, the ReproducibleResearch Task View on CRAN). Most of these tools focus on improving one's own workflow through closer...

New features in the most recent taxize update, v0.2

Scott Chamberlain — February 19, 2014
We just released a new version of taxize - version 0.2.0. This release contains a number of new features, and bug fixes. Here is a run down of some of the changes: First, install and load taxize install.packages("rgbif") library(taxize) New things New functions: class2tree Sometimes you just want to have a visual of the taxonomic relationships among taxa. If you don't know how to build a molecular phylogeny, don't have time, or there just isn't...

AntWeb - programmatic interface to ant biodiversity data

Karthik Ram — February 18, 2014
This post was updated on August 20, 2014, with AntWeb version 0.7.2.99. Please install an updated version to make sure the code works. Data on more than 10,000 species of ants recorded worldwide are available through from California Academy of Sciences' AntWeb, a repository that boasts a wealth of natural history data, digital images, and specimen records on ant species from a large community of museum curators. Digging through some of the earliest announcements of...

Changed and new things in the new version of rgbif, v0.5

Scott Chamberlain — February 17, 2014
rgbif is an R package to search and retrieve data from the Global Biodiverity Information Facilty (GBIF). rgbif wraps R code around the [GBIF API][gbifapi] to allow you to talk to GBIF from R. We just pushed a new verion of rgbif to cran - v0.5.0. Source and binary files are now available on CRAN. There are a few new functions: count_facet, elevation, and installations. These are described, with examples, below. Functions to work with...

Caching Encyclopedia of Life API calls

Scott Chamberlain — February 12, 2014
In a recent blog post we discussed caching calls to the web offline, on your own computer. Just like you can cache data on your own computer, a data provider can do the same thing. Most of the data providers we work with do not provide caching. However, at least one does: EOL, or Encyclopedia of Life. EOL allows you to set the amount of time (in seconds) that the call is cached, within which...

rOpenSci developer meeting in March

Karthik Ram — February 10, 2014
Our team has been cranking out a large number of tools over the past several months. As regular readers are aware, our software packages provide programmatic access to a diverse and extensive trove of scientific data. More recently we’ve expanded our efforts to build more general purpose and cross-domain tools. These include tools for reading, writing, integrating and publishing data, a unit testing platform for data, and a mapping engine that can visualize various kinds...

Caching API calls offline

Scott Chamberlain — February 3, 2014
I've recently heard the idea of "offline first" via especially Hood.ie. We of course don't do web development, but primarily build R interfaces to data on the web. Internet availablility is increasinghly ubiqutous, but there still are times and places where you don't have internet, but need to get work done. In the R packages we write there are generally two steps to every workflow: Make a call to the web to request data and...

Introducing the ecoengine package

Karthik Ram — January 30, 2014
Natural history museums have long been valuable repositories of data on species diversity. These data have been critical for fostering and shaping the development of fields such as biogeography and systematics. The importance of these data repositories is becoming increasingly important, especially in the context of climate change, where a strong understanding of how species responded to past climate is key to understanding their responses in the future. Leading the way in opening up such...

solr - an R interface to Solr

Scott Chamberlain — January 27, 2014
A number of the APIs we interact with (e.g., PLOS full text API, and USGS's BISON API in rplos and rbison, respectively) expose Solr endpoints. Solr is an Apache hosted project - it is a powerful search server. Given that at least two, and possibly more in the future, of the data providers we interact with provide Solr endpoints, it made sense to create an R package to make robust functions to interact with Solr...

Highlighting text in text mining

Scott Chamberlain — December 2, 2013
rplos is an R package to facilitate easy search and full-text retrieval from all Public Library of Science (PLOS) articles, and we have a little feature which aren't sure if is useful or not. I don't actually do any text-mining for my research, so perhaps text-mining folks can give some feedback. You can quickly get a lot of results back using rplos, so perhaps it is useful to quickly browse what you got. What better...

Open Science with R

Karthik Ram — December 2, 2013
Upcoming Book on Open Science with R We're pleased to announce that the rOpenSci core team has just signed a contract with CRC Press/Taylor and Francis R series to publish a new book on practical ways to implement open science into your own research using R. Given all the talk about the importance of open science, the discussion often lacks practical suggestions on how one might actually incorporate these practices into their day to day...

rgbif changes in v0.4

Scott Chamberlain — November 21, 2013
The Global Biodiversity Information Facility (GBIF) is a warehouse of species occurrence data - collecting data from a lot of different sources. Our package rgbif allows you to interact with GBIF from R. We interact with GBIF via their Application Programming Interface, or API. Our last version on CRAN (v0.3) interacted with the older version of their API - this version interacts with the new version of their API. However, we also retained functions that...

taxize changes

Scott Chamberlain — November 19, 2013
We are building a taxonomic toolbelt for R called taxize - which gives you programmatic access to many sources of taxonomic data on the web. We just pushed a new version to CRAN (v0.1.5) with a lot of changes (see here for a rundown). Here are a few highlights of the changes. Note: the windows binary may not be available yet... Install and load taxize install.packages("taxize") library(taxize) Taxonomic identifiers Each taxonomic service has their own...

Species occurrence data to CartoDB

Scott Chamberlain — November 4, 2013
We have previously written about creating interactive maps on the web from R, with the interactive maps on Github. See here, here, here, and here. A different approach is to use CartoDB, a freemium service with sql interface to your data tables that provides a map to visualize data in those tables. They released an R interace to their sql API on Github here - which we can use to make an interactive map from...

Interactive maps with polygons using R, Geojson, and Github

Scott Chamberlain — October 23, 2013
Previously on this blog we have discussed making geojson maps and uploading to Github for interactive visualization with USGS BISON data, and with GBIF data, and on my own personal blog. This is done using a file format called geojson, a file format based on JSON (JavaScript Object Notation) in which you can specify geographic data along with any other metadata. In two the previous posts about geojson, I described how you could get data...

OA week - A simple use case for programmatic access to PLOS full text

Scott Chamberlain — October 22, 2013
Open access week is here! We love open access, and think it's extremely important to publish in open access journals. One of the many benefits of open access literature is that we likely can use the text of articles in OA journals for many things, including text-mining. What's even more awesome is some OA publishers provide API (application programming interface) access to their full text articles. Public Library of Science (PLOS) is one of these....

Altmetrics workshop recap

Scott Chamberlain — October 15, 2013
I attended the recent ALM Workshop 2013 and data challenge hosted by Public Library of Science (PLOS) in San Francisco. The workshop covered various issues having to do with altmetrics, or article-level metrics (ALM). The same workshop last year definitely had a feeling of we don't know x, y, and z, while the workshop this year felt like we know a lot more. There were many great talks - you can see the list of...

Guide to using rOpenSci packages during the US Gov't shutdown

Scott Chamberlain — October 8, 2013
With the US government shut down, many of the federal government provided data APIs are down. We write R packages to interact with many of these APIs. We have been tweeting about what APIs that are down related to R pacakges we make, but we thought we would write up a proper blog post on the issue. NCBI services are still up! NCBI is within NIH, which is within the Department of Health and Human...

Web Technologies and Services taskview is up on CRAN

Scott Chamberlain — October 3, 2013
Just a quick note that the Task View we have been working on with others Web Technologies and Services is up on CRAN now. Find it here http://cran.r-project.org/web/views/WebTechnologies.html. This is the first version - there are definitely changes to come. Changes are being suggested as I write this on Twitter... The draft version of the task view is on Github here if you want to file an issue. We use many packages to do stuff...

A new tutorials setup

Scott Chamberlain — October 3, 2013
To help you use rOpenSci packages we put tutorials up on our site at http://ropensci.org/tutorials. Up to now, we created them with combination of raw html + converting code blocks to html and inserting them, etc. -- it was a slow process to update them when changes happened in our packages. So we thought of a better plan... Recently CRAN started accepting R package vignettes (basically, tutorials built in to packages) in R Markdown format....

A task view for interacting with the web from R

Scott Chamberlain — September 11, 2013
There is an increasing set of R packages for interacting with the web from R, whether it be the low level tools to interact with the web via http (see RCurl and httr), parsing data from the web (like RJSONIO and XML), or wrappers to web APIs that provide data (like twitteR). Most of you probably know about CRAN Task Views that aggregate information about R packages and functions on a particular subject area into...

Use cases as an interface to tool discovery

Scott Chamberlain — September 10, 2013
Good discovery tools for sotware are important as they can facilitate the pace of software development, bugs are found and squashed and new features added more quickly, and users find software they need faster. We have a page on our website for our packages that provides an overview of the packages we have, with descriptions and links. Two other ways to discover things include A gallery of examples, or use cases, in which the entry...

Working with climate data from the web in R

Scott Chamberlain — August 18, 2013
I recently attended ScienceOnline Climate, a conference in Washington, D.C. at AAAS. You may have heard of the ScienceOnline annual meeting in North Carolina - this was one of their topical meetings focused on Climate Change. I moderated a session on working with data from the web in R, focusing on climate data. Search Twitter for #scioClimate for tweets from the conference, and #sciordata for tweets from the session I ran. The following is an...

NOAA climate sparklines

Scott Chamberlain — August 5, 2013
We have started a new R package interacting with NOAA climate data called rnoaa. You can find our package in development here and documentation for NOAA web services here. It is still early days for this package, but we wanted to demo what you can do with the package. In this example, we search for stations that collect climate data, then get the data for those stations, pull out only the precipitation data, then get...

Consuming article-level metrics

Scott Chamberlain — August 1, 2013
We recently had a paper come out in a special issue on article-level metrics in the journal Information Standards Quarterly. Our paper basically compared article-level metrics provided by different aggregators. The other papers covered various article-level metrics topics from folks at PLOS, Mendeley, and more. Get our paper here. To get data from the article-level metrics providers we used one R package we created to get DOIs for PLOS articles (rplos) and three R packages...

Overlaying climate data with species occurrence data

Ted Hart — July 29, 2013
One of our primary goals at ROpenSci is to wrap as many science API's as possible. While each package can be used as a standalone interface, there's lots of ways our packages can overlap and complement each other. Sure He-Man usually rode Battle Cat, but there's no reason he couldn't ride a my little pony sometimes too. That's the case with our packages for GBIF and the worldbank climate data api. Both packages will give...

rOpenSci at ESA 2013

Karthik Ram — July 29, 2013
It's the last week in July and this means that ecologists across North America (and elsewhere) are busy returning from the field and preparing their presentations and posters in anticipation of the annual Ecological Society of America meeting. The entire rOpenSci dev team will be in attendance this year and we have several workshops, talks, and events planned out. The topics range from half-day workshops on open data, data visualization, reproducible research, to an entire...

Making maps of climate change

Ted Hart — July 19, 2013
A recent video on the PBS Ideas Channel posited that the discovery of climate change is humanities greatest scientific achievement. It took synthesizing generations of data from thousands of scientists, hundreds of thousands (if not more) of hours of computer time to run models at institutions all over the world. But how can the individual researcher get their hands of some this data? Right now the World Bank provides access to global circulation model (GCM)...

Style GeoJSON

Scott Chamberlain — July 17, 2013
Previously on this blog and on my own personal blog, I have discussed how easy it is to create interactive maps on Github using a combination of R, git and Github. This is done using a file format called geojson, a file format based on JSON (JavaScript Object Notation) in which you can specify geographic data along with any other metadata. In my previous post on this blog about geojson, I described how you could...

From occurrence data to interactive maps on the web

Scott Chamberlain — July 4, 2013
We have a number of packages for getting species occurrence data: rgbif and rbison. The power of R is that you can pull down this occurrence data, manipulate the data, do some analyses, and visualize the data - all in one open source framework. However, when dealing with occurrence data on maps, it is often useful to be able to interact with the visualization. Github, a code hosting and collaboration site, now renders a particular...

Revisiting our USGS app

Scott Chamberlain — June 19, 2013
R has a reputation of not playing nice on the web. At rOpenSci, we write R pacakages to bring data from around the web into R on your local machine - so we mostly don't do any dev for the web. However, the United States Geological Survey (USGS) recenty held an app competition - it was a good opportunity to play with R on the web. We won best overall app as described in an...

What we hope to accomplish with the new funding

Core Team — June 14, 2013
At rOpenSci's virtual HQ we're busy planning out several exciting projects for the coming year thanks to the generous 180k grant from Sloan. In the interest of maintaining transparency with our community here are additional details of what we hope to accomplish and how we'll measure our successes. We have also posted a full copy of our proposal over at figshare. Objectives for the year a) Focus on identifying shortcomings, strengthening our core products, and...

rOpenSci awarded 180K from The Sloan Foundation

Karthik Ram — June 12, 2013
Today we are pleased to announce that rOpenSci has been awarded a generous 180K grant from the Alfred P. Sloan foundation. This funding will allow us to develop a whole new suite of tools and provide scientists with general purpose toolkits to access various kinds of scientific data. We will also be traveling a whole bunch this year and running workshops at several conferences and universities. If you'd like us to speak to your research...

BISON USGS species occurrence data

Scott Chamberlain — May 27, 2013
The USGS recently released a way to search for and get species occurrence records for the USA. The service is called BISON (Biodiversity Information Serving Our Nation). The service has a web interface for human interaction in a browser, and two APIs (application programming interface) to allow machines to interact with their database. One of the APIs allows you to search and retrieve data, and the other gives back maps as either a heatmap or...

rOpenSci updates on packages and the website

Scott Chamberlain — May 20, 2013
We've been busy We have been busy hacking away at code and our website. Here is an update on what we've been up to. Packages rplos/alm PLoS provides two different API services: the Search API and ALM API. As their names suggest, the search API lets you search and get text from their papers and associated metadata. The ALM API allows you to get article level metrics data on PLoS papers. Up until a few...

Facilitating Open Science with Python

Steve Moss — May 16, 2013
A guest blog post by Steve Moss Why Python? A little background! I started using Python in the summer of 2010. I had applied for the Master of Research postgraduate degree in Computational Biology at the University of York. They teach the programming portion of their course using Python. I thought it might be useful to learn it, before starting, to give me a bit of a head start. From the beginning, it was clear...

Introducing the BEFData package

Karthik Ram — May 10, 2013
This is a guest post by Class-Thido Pfaff We here present the BEFdata R package as part of the rOpenSci project. It is an API package that combines the strengths of the BEFdata portal in handling small, complex datasets with the powerful statics package R. The portal itself is free software as well and can be found here. The BEFdata platforms support interdisciplinary data sharing and harmonisation of distributed research projects collaborating with each other....

USGS App Contest

Scott Chamberlain — April 22, 2013
Many US federal agencies are now running app competitions to highlight their web services (see here), and hopefully get people to build cool stuff using government data (see Data.gov for more). See here for a nice list of the US government's web services. One of these agencies was the United States Geological Survey (USGS). They opened up an app competition and we won best overall app! Check out our app called TaxaViewer here: http://glimmer.rstudio.com/ropensci/usgs_app/. We...

Use case - how to get species occurrence data from GBIF for a genus

Scott Chamberlain — April 12, 2013
Real use cases from people using our software are awesome. They are important for many reasons: 1) They make the code more useable because we may change code to make the interace and output easier to understand; 2) They may highlight bugs in our code; and 3) They show us what functions users care the most about (if we can assume number of questions equates to use). If someone has a question, others are likely...

Scholarly metadata in R

Scott Chamberlain — March 15, 2013
Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc. One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH. Many publishers, provide their metadata through their own endpoint, and implement the standard OAI-PMH methods: GetRecord, Identify,...

Visualizing rOpenSci collaboration

Scott Chamberlain — March 8, 2013
We have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a Github repository, and when they did it. Then we can visualize this hitorical record. Install some functions for...

is.invasive()

Scott Chamberlain — November 26, 2012
The following is a guest post from Ignasi Bartomeus, originally posted on his blog on 26 Nov, 2012. Check out a related blog post here. Note the functionality discussed in this post is now in our taxize package under the function gisd_isinvasive. We hacked out a quick Shiny app so you can play around with the below function in taxize on the web to get invasive status and plot it on a phylogeny. Check it...