rOpenSci Blog

The Value of #Welcome

Stefanie Butland — July 18, 2017
I’m participating in the AAAS Community Engagement Fellows Program (CEFP), funded by the Alfred P. Sloan Foundation. The inaugural cohort of Fellows is made up of 17 community managers working in a wide range of scientific communities. This is cross-posted from the Trellis blog as part of a series of reflections that the CEFP Fellows are sharing. In my training as a AAAS Community Engagement Fellow, I hear repeatedly about the value of extending a...

skimr for useful and tidy summary statistics

Eduardo Arino de la Rubia, Shannon Ellis, Julia Stewart Lowndes, Hope McLeod, Amelia McNamara, Michael Quinn, Elin Waring, Hao Zhu — July 11, 2017
Like every R user who uses summary statistics (so, everyone), our team has to rely on some combination of summary functions beyond summary() and str(). But we found them all lacking in some way because they can be generic, they don't always provide easy-to-operate-on data structures, and they are not pipeable. What we wanted was a frictionless approach for quickly skimming useful and tidy summary statistics as part of a pipeline. And so at rOpenSci...

Announcing the rOpenSci Fellowships Program

Karthik Ram — July 6, 2017
rOpenSci's mission is to promote a culture of open, transparent, and reproducible research across various research domains. Everything we do, from developing high-quality open-source software for data science and, software review, to building community through events like our community calls and annual unconference are all geared toward lowering barriers to reproducible, open science. The rOpenSci Fellowship presents a unique opportunity for researchers who are engaged in open source to have a bigger voice in their...

Launching webrockets at runconf17

Alicia Schep, Miles McBain — July 5, 2017
We, Alicia Schep and Miles McBain, drove the webrockets project at #runconf17. To make progress we solicited code, advice, and entertaining anecdotes from a host of other attendees, whom we humbly thank for helping to make our project possible. This post is divided into two sections: First up we'll relate our experiences, prompted by some questions we wrote for one another. Second, we'll put the webrockets package into context and walk you through a fun...

Introducing our Postdoctoral Fellow, Dr. Dan Sholler

Stefanie Butland, Karthik Ram — June 30, 2017
We are pleased to welcome our Postdoctoral Fellow, Dr. Dan Sholler. Dan is an expert in qualitative research (yes, you read that correctly) and studies digital infrastructure creation, growth, and maintenance efforts. Through this research interest, he was drawn to the open science community and its ongoing development of tools and communities to support sustainable, reproducible, high-quality research. With rOpenSci, he intends to investigate what drives scientists to engage with or resist open science tools...

packagemetrics - Helping you choose a package since runconf17

Becca Krouse, Erin Grand, Hannah Frick, Lori Shepherd, Sam Firke, William Ampeh — June 27, 2017
Before everybody made their way to the unconf via LAX and Lyft, attendees discussed potential project ideas online. The packagemetrics package was our answer to two related issues. The first proposal centered on creating and formatting tables in a reproducible workflow. After many different package suggestions started pouring in, we were left with a classic R user conundrum: "Which package do I choose?" With over 10,000 packages on CRAN - and thousands more on GitHub...

Hey! You there! You are welcome here

Shannon E. Ellis — June 23, 2017
What's that? You've heard of R? You use R? You develop in R? You know someone else who's mentioned R? Oh, you're breathing? Well, in that case, welcome! Come join the R community! We recently had a group discussion at rOpenSci's #runconf17 in Los Angeles, CA about the R community. I initially opened the issue on GitHub. After this issue was well-received (check out the emoji-love below!), we realized people were keen to talk about...

Tackling the Research Compendium at runconf17

Noam Ross, Alice Daish, Laura DeCicco, Molly Lewis, Nistara Randhawa, Jennifer Thompson, Nick Tierney — June 20, 2017
Two years ago at #runconf15, there was a great discussion about best practices for organizing R-based analysis projects that yielded a nice guidance document describing research compendia. Compendia, as we described them, were minimal products of reproducible research, using parts of R package structure to organize the inputs, analyses, and outputs of research projects. Since then, we've seen more examples and models of research compendia emerge (the organization of such projects is something of an...

New rOpenSci Packages for Text Processing in R

Jeroen Ooms — June 13, 2017
Textual data and natural language processing are still a niche domain within the R ecosytstem. The NLP task view gives an overview of existing work however a lot of basic infrastructure is still missing. At the rOpenSci text workshop in April we discussed many ideas for improving text processing in R which revealed several core areas that need improvement: Reading: better tools for extracing text and metadata from documents in various formats (doc, rtf, pdf,...

Unconf projects 5: mwparser, Gargle, arresteddev

Karthik Ram — June 9, 2017
And finally, we end our series of unconf project summaries (day 1, day 2, day 3, day 4). mwparser Summary: Wikimarkup is the language used on Wikipedia and similar projects, and as such contains a lot of valuable data both for scientists studying collaborative systems and people studying things documented on or in Wikipedia. mwparser parses wikimarkup, allowing a user to filter down to specific types of tags such as links or templates, and then...

Unconf projects 4: cityquant, notary, packagemetrics, pegax

Scott Chamberlain — June 8, 2017
Continuing our series of blog posts (day 1, day 2, day 3) this week about unconf 17. cityquant Summary: The goal with the cityquant project was to build a digital dashboard for sustainable cities. They also had a "spin-off" project called selfquant to get data from a quantified self google sheets template to keep track of weekly performance in various categories. Team: Reka Solymosi, Ben Best, Chelsea Ursaner, Tim Phan, Jasmine Dumas Github: https://github.com/ropenscilabs/cityquant notary...

Unconf projects 3: available, miner, rcheatsheet, ponyexpress

Karthik Ram — June 7, 2017
Continuing our series of blog posts (day 1, day 2) this week about unconf 17. available Summary: Ever have trouble naming your software package? Find a great name and realize it's already taken on CRAN, or further along in development on GitHub? The available package makes it easy to check for valid, available names, and also checks various sources for any unintended meanings. The package can also suggest names based on the description and title...

Unconf projects 2: checkers, gramr, data-packages, exploRingJSON

Scott Chamberlain — June 6, 2017
Following up on Stefanie's recap of unconf 17, we are following up this entire week with summaries of projects developed at the event. We plan to highlight 4-5 projects each day, with detailed posts from a handful of teams to follow. checkers Summary: checkers is a framework for reviewing analysis projects. It provides automated checks for best practices, using extensions on the goodpractice package. In addition, checkers includes a descriptive guide for best practices. Team:...

Unconf projects 1: skimr, emldown, testrmd, webrockets

Karthik Ram — June 5, 2017
Following up on Stefanie's recap of unconf 17, we are following up this entire week with summaries of projects developed at the event. We plan to highlight 4-5 projects each day, with detailed posts from a handful of teams to follow. skimr Summary: skimr, a package inspired by Hadley Wickham's precis package, aims to provide summary statistics iteratively and interactively as part of a pipeline. The package provides easily skimmable summary statistics to help you...

Bringing Together People and Projects at Unconf17

Stefanie Butland — June 2, 2017
We held our 4th annual unconference in Los Angeles, May 25-26, 2017. Scientists, R-software users and developers, and open data enthusiasts from academia, industry, government, and non-profits came together for two days to hack on projects they dreamed up and to give our online community an opportunity to connect in-person. The result? 70 people from 7 countries on 3 continents proposed 69 ideas leading to 21 projects in 2 days, and one awesome community just...

Easy linguistic mapping with lingtypology

George Moroz — May 16, 2017
As all other types of visualization, linguistic mapping has two main goals: data presentation and data analysis. The most common purpose for which linguistic maps are used, is simply pointing to the location of one or more languages of interest (presentation). A more sophisticated task is showing the distribution of particular linguistic features or their combination among languages of a certain area (presentation and analysis). There are three linguistic subdisciplines that use maps for visualization:...

Text Analysis R Developers' Workshop 2017

Ken Benoit — May 3, 2017
On 21-22 April, the London School of Economics hosted the Text Analysis Package Developers' Workshop, a two-day event held in London that brought together developers of R packages for working with text and text-related data. This included a wide range of applications, including string handling (stringi) and tokenization (the rOpenSci-onboarded tokenizers, KoNLP), corpus and text processing (readtext, tm, quanteda, and qdap), natural language processing (NLP) such as part of speech and dependency tagging (cleanNLP, spacyr),...

Chat with the rOpenSci team at upcoming meetings

Stefanie Butland — May 1, 2017
You can find members of the rOpenSci team at various meetings and workshops around the world. Come say 'hi', learn about how our packages can enable your research, or about our onboarding process for contributing new packages, discuss software sustainability or tell us how we can help you do open and reproducible research. Where's rOpenSci? When Who Where What May 1-3, 2017 Karthik Ram Portland, OR csv,conf,v3 May 9, 2017 Scott Chamberlain Portland, OR Exploring...

Welcome to our rOpenSci Interns

Scott Chamberlain, Stefanie Butland — April 27, 2017
There's a lot of work that goes in to making software: the code that does the thing itself, unit testing, examples, tutorials, documentation, and support. rOpenSci software is created and maintained both by our staff and by our (awesome) community. In keeping with our aim to build capacity of software users and developers, three interns from our academic home at UC Berkeley are now working with us as well. Our interns are mentored by Carl...

Release 'open' data from their PDF prisons using tabulizer

Thomas J. Leeper — April 18, 2017
There is no problem in science quite as frustrating as other peoples' data. Whether it's malformed spreadsheets, disorganized documents, proprietary file formats, data without metadata, or any other data scenario created by someone else, scientists have taken to Twitter to complain about it. As a political scientist who regularly encounters so-called "open data" in PDFs, this problem is particularly irritating. PDFs may have "portable" in their name, making them display consistently on various platforms, but...

Data validation with the assertr package

Tony Fischetti — April 11, 2017
This is cross-posted from Tony's blog onthelambda.com Version 2.0 of my data set validation package assertr hit CRAN just this weekend. It has some pretty great improvements over version 1. For those new to the package, what follows is a short and new introduction. For those who are already using assertr, the text below will point out the improvements. I can (and have) go on and on about the treachery of messy/bad datasets. Though its...

Everybody talks about the weather

Adam Sparks — April 4, 2017
Everybody talks about the weather, but nobody does anything about it. - Charles Dudley Warner As a scientist who models plant diseases, I use a lot of weather data. Often this data is not available for areas of interest. Previously, I worked with the International Rice Research Institute (IRRI) and often the countries I was working with did not have weather data available or I was working on a large area covering several countries and...

camsRad, satellite-based time series of solar irradiation

Lukas Lundström — March 21, 2017
camsRad is a lightweight R client for the CAMS Radiation Service, that provides satellite-based time series of solar irradiation for the actual weather conditions as well as for clear-sky conditions. Satellite-based solar irradiation data have been around roughly as long our modern era satellites. But the price tag has been very high, in the range of several thousand euros per site. This has damped research and development of downstream applications. With CAMS Radiation Service coming...

Release mongolite 1.0

Jeroen Ooms — March 10, 2017
After 2.5 years of development, version 1.0 of the mongolite package has been released to CRAN. The package is now stable, well documented, and will soon be submitted for peer review to be onboarded in the rOpenSci suite. MongoDB in R and mongolite I started working on mongolite in September 2014, and it was first announced at the rOpenSci unconf 2015. At this time, there were already two Mongo clients on CRAN: rmongodb (no longer...

Discover hydrological data using the hddtools R package

Claudia Vitolo — March 7, 2017
I've worked for over 12 years in hydrology and natural hazard modelling and one of the things that still fascinates me is the variety of factors that come into play in trying to predict phenomena such as river floods. From local observations of meteorological and hydrological variables and their spatio-temporal patterns to the type and condition of soils and vegetation/land use as well as the geometry and state of river channels and engineering structures affecting...

ropenaq, a breath of fresh air/R

Maëlle Salmon — February 21, 2017
Do you fancy open data, R, and breathing? Then you might be interested in ropenaq which provides access to open air quality data via OpenAQ! Also note that in French, R and air are homophones, therefore we French speakers can make puns like the one in the title. Please re-read it with a French accent and don't judge me. In this post I'll motivate the existence of the package, then show you the basics of...

Community Call v13 - How to ask questions so they get answered! Possibly by yourself!

Stefanie Butland — February 17, 2017
Our Community Call on Tuesday, March 7th, 8-9 AM PST, will cover "How to ask questions so they get answered! Possibly by yourself!". Asking questions about programming is a skill you can develop - we're not just born with it. The speakers will cover some of the background and skills you'll need to increase your chances of having your questions answered by your peers or by a busy expert. Join the Call Agenda Welcome (5...

From a million nested `ifelse`s to the plater package

Sean Hughes — February 6, 2017
As a lab scientist, I do almost all of my experiments in microtiter plates. These tools are an efficient means of organizing many parallel experimental conditions. It's not always easy, however, to translate between the physical plate and a useful data structure for analysis. My first attempts to solve this problem--nesting one ifelse call inside of the next to describe which well was which--were very unsatisfying. Over time, my attempts at solving the problem grew...

Apply to attend rOpenSci unconf 2017!

Stefanie Butland — February 2, 2017
For a fourth year running, we are excited to announce the rOpenSci unconference, our annual event loosely modeled on Foo Camp. We're organizing #runconf17 to bring together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a couple of days to hack on various projects and generally enrich our community. The agenda is mostly decided during the unconference itself. Past projects have related to open data, data visualization,...

Extracting and Enriching Ocean Biogeographic Information System (OBIS) Data with R

Tom Webb — January 25, 2017
Programmatic access to biodiversity data is revolutionising large-scale, reproducible biodiversity research. In the marine realm, the largest global database of species occurrence records is the Ocean Biogeographic Information System, OBIS. As of January 2017, OBIS contains 47.78 million occurrences of 117,345 species, all openly available and accessible via the OBIS API. The number of questions to address using these kinds of resources is as large as the number of investigators, but certain operations commonly crop...

Using xml schema and xslt in R

Jeroen Ooms — January 10, 2017
This week an update for xml2 and a new xslt package have appeared on CRAN. A full announcement for xml2 version 1.1 will appear on the rstudio blog. This post explains xml validation (via xsd schema) and xml transformation (via xslt stylesheets) which have been added in this release. XML schemas and stylesheets are not exactly new; both xslt 1.1 (2001) and xsd 1.0 (2004) have been available in browsers for over a decade. Revised...

A guide to sustainability models for research software projects

Daniel S. Katz — January 9, 2017
A research project often starts with a bright idea and an initial commitment of volunteer time, or perhaps, a fixed term grant. But what happens after that initial activity? How can the project continue to sustain itself? (We define sustainability as the capacity to endure. Software is sustainable if it will continue to be available in the future, on new platforms, and meeting new needs. [This is from slide 23 of http://www.slideshare.net/danielskatz/scientific-software-challenges-and-community-responses, though it may...

Our Community Manager Selected for AAAS Community Engagement Fellowship Program

Stefanie Butland — January 3, 2017
Next week I'll be in Washington DC to meet my peers in research community management as part of the inaugural class of the AAAS Community Engagement Fellowship Program! The program, funded by the Alfred P. Sloan Foundation, has a mission to improve community building and collaboration in scientific organizations and research collaborations by providing a year of training and support to a cohort of scientific community managers. The Fellowship will begin in January 2017 when...

Highlights and Resources from Community Call v12: How do I create a code of conduct for my event/lab/codebase?

Stefanie Butland — December 21, 2016
Our Community Call on December 15th covered a big topic in tech communities: "How do I create a code of conduct for my event/lab/codebase?". Here, we cover some of the key themes and considerations that arose from the discussion and point to curated resources and examples to follow when developing a code of conduct (CoC) for your community. Three guest speakers shared different perspectives. Dr Pauline Barmby talked about the process and lessons learned as...

Announcing our first fellowship awarded to Dr. Nick Golding

Stefanie Butland, Karthik Ram — December 12, 2016
rOpenSci's overarching mission is to promote a culture of transparent, open, and reproducible research across various scientific communities. All of our activities are geared towards lowering barriers to participation, and building a community of practitioners around the world. In addition to developing and maintaining a large suite of open source tools for data science, we actively support the research community with expert review on research software development, community calls, and hosting annual unconferences around the...

High Performance CommonMark and Github Markdown Rendering in R

Jeroen Ooms — December 2, 2016
This week the folks at Github have open sourced their fork of libcmark (based on the extensive PR by Mathieu Duponchelle), which they use to render markdown text within documents, issues, comments and anything else on the Github website. The new release of the commonmark R package incorporates this library so that we can take advantage of Github quality markdown rendering in R. The most exciting change is that the library has gained an extension...

The rOpenSci geospatial suite

Scott Chamberlain — November 22, 2016
Geospatial data - data embedded in a spatial context - is used across disciplines, whether it be history, biology, business, tech, public health, etc. Along with community contributors, we're working on a suite of tools to make working with spatial data in R as easy as possible. If you're not familiar with geospatial tools, it's helpful to see what people do with them in the real world. Example 1 One of our geospatial packages, geonames,...

The new Tesseract package: High Quality OCR in R

Jeroen Ooms — November 16, 2016
Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form. People looking to extract text and metadata from pdf files in R should try...

Chat with the rOpenSci team at upcoming meetings

Stefanie Butland — November 9, 2016
You can find members of the rOpenSci team at various meetings and workshops around the world. Come say 'hi', learn about how our packages can enable your research, or about our onboarding process for contributing new packages, discuss software sustainability or tell us how we can help you do open and reproducible research. Where's rOpenSci? November 2016 to February 2017 When Who Where What Nov 13–18, 2016 Dan Katz Salt Lake City, US SC16 Nov...

Community Call v12 - How do I create a code of conduct for my event/lab/codebase?

Stefanie Butland — October 31, 2016
In order to facilitate a transformation towards open and reproducible research, rOpenSci is building and improving not only the technical infrastructure, but the social infrastructure as well. To support this, occasionally a Community Call will focus on a topic that reflects the values of rOpenSci. The first of these, on Thursday, December 15th, 8-9 AM PST, will be on "How do I create a code of conduct for my event/lab/codebase?". Agenda Welcome (5 min, Stefanie...

Greetings from Your Community Manager!

Stefanie Butland — October 12, 2016
I feel both proud and privileged to join rOpenSci as your Community Manager. I’ve been a compulsive community builder since the early 2000’s, but it has rarely been part of my job description. Now it seems like all roads have led to this. After a couple of fine days of indoctrination at the UC Berkeley home of rOpenSci, I’m settled into work in beautiful Kamloops, British Columbia, Canada. So much of my perspective of rOpenSci...

Postdoctoral Scholar – Sustainable Software and Reproducible Research

Karthik Ram — September 7, 2016
The rOpenSci project based at the University of California, Berkeley seeks to hire a postdoctoral scholar to work on the research activities funded by the grant titled “Fostering the next generation of sustainable software and reproducible research practices in the scientific community”. The project develops open source software to promote reproducible research practices in the scientific community. The postdoctoral scholar will focus on a research topic aligned with their own interests in order to better...

Advanced Image-Processing in R with Magick, Part I

Jeroen Ooms — August 23, 2016
The new magick package is an ambitious effort to modernize and simplify high-quality image processing in R. It wraps the ImageMagick STL which is perhaps the most comprehensive open-source image processing library available today. The ImageMagick library has an overwhelming amount of functionality. The current version of Magick exposes a decent chunk of it, but being a first release, documentation is still sparse. This post briefly introduces the most important concepts to get started. There...

New package tokenizers joins rOpenSci

Lincoln Mullen — August 23, 2016
The R package ecosystem for natural language processing has been flourishing in recent days. R packages for text analysis have usually been based on the classes provided by the NLP or tm packages. Many of them depend on Java. But recently there have been a number of new packages for text analysis in R, most notably text2vec, quanteda, and tidytext. These packages are built on top of Rcpp instead of rJava, which makes them much...

rotl paper published

Francois Michonneau, Joseph Brown, David Winter — July 26, 2016
We are excited to announce a paper describing rotl, our package for the Open Tree of Life data, has been published. The full citation is: Michonneau, F., Brown, J. W. and Winter, D. J. (2016), rotl: an R package to interact with the Open Tree of Life data. Methods Ecol Evol. doi: https://doi.org/10.1111/2041-210X.12593 The paper, which is freely available, describes the package and the data it wraps in detail. Rather than rehash the information here,...

Testing packages with R Travis for OS-X

Jeroen Ooms — July 12, 2016
Travis is a continuous integration service which allows for running automated testing code everytime you push to GitHub. Hadley's book about R packages explains how and why R package authors should take advantage of this in their development process. The build matrix Travis is now providing support for multiple operating systems, including Ubuntu 14.04 (Trusty) and various flavors of Mac OS-X. Jim Hester has done a great job of tweaking the travis R-language build script...

Australia Unconference

Jessie Roberts, Miles McBain, Nicholas Tierney — June 16, 2016
On April 21st and 22nd of 2016, we had 40 members of the R community gather in Brisbane, Australia, with the goal of reproducing the rOpensci Unconference events that have been running with great success in San Francisco since 2014. Like every event organisers ever, we went through the usual crisis: Where will it be? Will anyone actually show up? Is the problem space over venue, date, attendees, catering, sponsors convex? It it even possible...

Software sustainability research with rOpenSci

Daniel S. Katz — May 25, 2016
I’m happy to announce that I’ve started a project with rOpenSci under their recent award from the Helmsley Foundation. My work with rOpenSci will focus on sustainability of the project itself. Sustainability can be defined as having the resources to do the necessary work to continue and grow rOpenSci. This is one of the most difficult challenges for rOpenSci and for many other research software projects. rOpenSci has a very broad and very ambitious goal,...

Onboarding at rOpenSci: A Year in Reviews

Noam Ross, Carl Boettiger, Jenny Bryan, Scott Chamberlain, Rich FitzJohn, Karthik Ram — March 28, 2016
Code review, in which peers manually inspect the source code of software written by others, is widely recognized as one of the best tools for finding bugs in software. Code review is relatively uncommon in scientific software development, though. Scientists, despite being familiar with the process of peer review, often have little exposure to code review due to lack of training and historically little incentive to share the source code from their research. So scientific...

rOpenSci geospatial libraries

Scott Chamberlain — March 17, 2016
Geospatial data input/output, manipulation, and vizualization are tasks that are common to many disciplines. Thus, we're keenly interested in making great tools in this space. We have an increasing set of spatial tools, each of which we'll cover sparingly. See the cran and github badges for more information. We are not trying to replace the current R geospatial libraries - rather, we're trying to fill in gaps and create smaller tools to make it easy...

We're hiring a community manager!

Core Team — March 10, 2016
The rOpenSci team is growing, thanks in part to our recent funding. We recently welcomed Jeroen Ooms on the software development side and today we're thrilled to announce a position for community manager. Our mission is to expand access to scientific data and promote a culture of reproducible research and sustainable research software. We aim to cultivate a vibrant and open community through activities such as our community calls, discussion forums, package review, and annual...

Australian rOpenSci Unconference

Nicholas Tierney — March 9, 2016
The rOpenSci Unconference is coming to Australia and we are excited!! The event will take place in sunny Brisbane, on April 21-22 2016 hosted at the Microsoft Innovation Centre. You can find more information about the event and how to register at http://auunconf.ropensci.org/. I was completely and unceremoniously thrown into the deep end when I first started learning R. Contrary to what I initially thought possible, I am now irreversibly converted to the ideology of...

Introducing pdftools - A fast and portable PDF extractor

Jeroen Ooms — March 1, 2016
Scientific articles are typically locked away in PDF format, a format designed primarily for printing but not so great for searching or indexing. The new pdftools package allows for extracting text and metadata from pdf files in R. From the extracted plain-text one could find articles discussing a particular drug or species name, without having to rely on publishers providing metadata, or pay-walled search engines. The pdftools slightly overlaps with the Rpoppler package by Kurt...

Help us prioritize what to build in 2016

Karthik Ram — January 7, 2016
We've got a big year ahead of us as we work towards expanding our team and organizing various events and activities. We remain committed to supporting and expanding the landscape of open source tools that are available to researchers. While much of our focus has been around making it easier to access various data repositories, we are keen on improving other parts of the research pipeline, including data munging, documentation and sharing. To help us...

rOpenSci Announces $2.9M Award from the Helmsley Charitable Trust

Karthik Ram — November 19, 2015
rOpenSci, whose mission is to develop and maintain sustainable software tools that allow researchers to access, visualize, document, and publish open data on the Web, is pleased to announce that it has been awarded a grant of nearly $2.9 million over three years from The Leona M. and Harry B. Helmsley Charitable Trust. The grant, which was awarded through the Trust’s Biomedical Research Infrastructure Program, will be used to expand rOpenSci’s mission of developing tools...

Rentrez 1_0 released

David Winter — September 24, 2015
A new version of rentrez, our package for the NCBI's EUtils API, is making it's way around the CRAN mirrors. This release represents a substantial improvement to rentrez, including a new vignette that documents the whole package. This posts describes some of the new things in rentrez, and gives us a chance to thank some of the people that have contributed to this package's development. Thanks Thanks to everyone who has filed and issue or...

A drat repository for rOpenSci

Carl Boettiger — August 4, 2015
We're happy to announce the launch of a CRAN-style repository for rOpenSci at http://packages.ropensci.org This repository contains the latest nightly builds from the master branch of all rOpenSci packages currently on GitHub. This allows users to install development versions of our software without specialized functions such as install_github(), allows dependencies not hosted on CRAN to still be resolved automatically, and permits the use of update.packages(). Using the repository To use, simply add packages.ropensci.org to your...

The challenge of combining 176 x #otherpeoplesdata to create the Biomass And Allometry Database

Daniel Falster, Rich FitzJohn, Remko Duursma, Diego Barneche — June 3, 2015
Despite the hype around "big data", a more immediate problem facing many scientific analyses is that large-scale databases must be assembled from a collection of small independent and heterogeneous fragments -- the outputs of many and isolated scientific studies conducted around the globe. Collecting and compiling these fragments is challenging at both political and technical levels. The political challenge is to manage the carrots and sticks needed to promote sharing of data within the scientific...

Database interfaces

Scott Chamberlain — May 20, 2015
There are many different databases. The most familiar are row-column SQL databases like MySQL, SQLite, or PostgreSQL. Another type of database is the key-value store, which as a concept is very simple: you save a value specified by a key, and you can retrieve a value by its key. One more type is the document database, which instead of storing rows and columns, stores blobs of text or even binary files. The key-value and document...

Introducing a Wishlist for Scientific R Packages

Oliver Keyes — March 10, 2015
There are two things that make R such a wonderful programming environment - the vast number of packages to access, process and interpret data, and the enthusiastic individuals and subcommunities (of which rOpenSci is a great example). One, of course, flows from the other: R programmers write R packages to provide language users with more features, which makes everyone's jobs easier and (hopefully!) attracts more users and more contributions. But what if you have an...

Curling - exploring web request options

Scott Chamberlain — December 18, 2014
rOpenSci specializes in creating R libraries for accessing data resources on the web from R. Most times you request data from the web in R with our packages, you should have no problem. However, you evenutally will run into problems. In addition, there are advanced things you can do modifying requests to web resources that fall in the advanced stuff category. Underlying almost all of our packages are requests to web resources served over the...

Community calls

Scott Chamberlain — December 15, 2014
Key to the success of rOpenSci is our community and we want to hear more regularly from our members, and foster new interactions among the group. In addition, community calls are a way for us to give important updates, and get feedback on them. We tentatively plan on doing community calls once per month. The format of rOpenSci community calls could be of various types. We could have community members show off software they've been...

Growth of open data in biology

Scott Chamberlain — November 10, 2014
Why open data growth At rOpenSci we try to make it easier for people to use open data and contribute open data to the community. The question often arises: How much open data do we have? Another angle on this topic is: How much is open data growing? We provide access to dozens of data respositories through our various packages. We asked many of them to share numbers on the amount of data they have,...

Introducing Rocker: Docker for R

Carl Boettiger, Dirk Eddelbuettel — October 23, 2014
You only know two things about Docker. First, it uses Linux containers. Second, the Internet won't shut up about it. -- attributed to Solomon Hykes, Docker CEO So what is Docker? Docker is a relatively new open source application and service, which is seeing interest across a number of areas. It uses recent Linux kernel features (containers, namespaces) to shield processes. While its use (superficially) resembles that of virtual machines, it is much more lightweight...

New fiscal sponsorship agreement with NumFocus foundation

Karthik Ram — October 1, 2014
I’m very pleased to announce that rOpenSci has signed a comprehensive fiscal sponsorship agreement with the NumFocus foundation, a 501(c)3 nonprofit that supports R&D for open source scientific software projects. We are delighted to be in the company of esteemed projects such as IPython and Julia that share our goal of promoting reproducible research practices across many scientific communities and developing a rich ecosystem of tools for open scientific computing. All of our activities, from...

NCEAS Codefest Follow-up

Scott Chamberlain, Ted Hart — September 23, 2014
The week after labor day, we had the pleasure of attending the NCEAS open science codefest event in Santa Barbara. It was great to meet folks like the new arrivals at the expanding Mozilla Science Lab, Bill Mills and Abby Cabunoc (Bill even already has a great post up about the codefest), and see old friends from NCEAS and DataONE, among many more. This 2.5 day event ran smoothly thanks to the leadership of Matt...

rOpenSci at NESCent Open Tree of Life Hackathon

David Winter — August 15, 2014
The Open Tree of Life project aims to synthesize our combined knowledge of how organisms relate to each other, and make the results available to anyone who wants to use them. At present, the project contains data from more than 4,000 published phylogenies, which combine with other data sources to make a tree that covers 2.5 million species. In September, the Open Tree of Life team are holding a hackathon to develop tools that use...

Announcing our ambassadors program

Karthik Ram — August 11, 2014
In the last 12 months we traveled all over the world delivering talks and hands on workshops at various conferences and universities. This was a great opportunity for us to raise awareness for the project and get more of you involved as contributors and collaborators. As we scale the project to the next level, we need your help in spreading the message. Today we would like to officially announce the rOpenSci Ambassadors program. To facilitate...

Community conversations and a new package for full text

Scott Chamberlain, Karthik Ram — August 8, 2014
UPDATE: Use the new discussion forum at http://discuss.ropensci.org/ Community Community is at the heart of rOpenSci. We couldn't have accomplished most of our work without help from various contributors and users. Most of our discussions with the broader community over the past year have been through twitter or one-on-one conversations. However, we would like to foster more open ended and deeper discussions with our community. To this end, we are resurrecting our public Google group...

NCEAS Codefest

Scott Chamberlain — August 6, 2014
We're delighted to be sponsoring the upcoming Open Science Codefest in Santa Barbara, California, alongside RENCI, NCEAS, NSF, DataONE, and Mozilla Science Lab. The Open Science Codefest's goal is to gather researchers from across ecology, biodiversity science, and other earth and environmental sciences with programmer types to collaborate on coding projects. The ideas for the event so far include not just coding projects with the end result being software, but conversations on particular topics that...

Changes in rnoaa v0.2.0

Scott Chamberlain — July 21, 2014
We just released v0.2 of rnoaa. For details on the update, see the release notes. What follows are some notes on the more important changes. Updating to v0.2 Install rnoaa from CRAN install.packages("rnoaa") or Github devtools::install_github("ropensci/rnoaa") Then load rnoaa library("rnoaa") UI changes We changed almost all function names to have a more intuitive programmatic user interface (or UI). We changed all noaa*() functions to ncdc*() - these work only with NOAA National Climatic Data Center...

rOpenSci awarded $300k from the Sloan Foundation

Karthik Ram — June 10, 2014
We're delighted to announce that we have received additional funding from the Sloan Foundation to continue and expand our efforts from the past year. We're grateful for the overwhelming support from the community, especially through engagement at various events we organized and attended this past year. Over the next year we plan to: advance not only the technical infrastructure for accessing, managing, and synthesizing large and heterogeneous data, but also the social infrastructure of research...

Reproducible research is still a challenge

Rich FitzJohn, Matt Pennell, Amy Zanne, Will Cornwell — June 9, 2014
Science is reportedly in the middle of a reproducibility crisis. Reproducibility seems laudable and is frequently called for (e.g., nature and science). In general the argument is that research that can be independently reproduced is more reliable than research that cannot be independently reproduced. It is also worth noting that reproducing research is not solely a checking process, and it can provide useful jumping-off points for future research questions. It is difficult to find a...

taxize v0.3.0 update - a new data source, taxonomy in writing, and uBio examples

Scott Chamberlain — May 20, 2014
We just released v0.3 of taxize. For details on the update, see the release notes. Some new features New function iplant_resolve() to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in the tnrs() function. New function ipni_search() to search for names in the International Plant Names Index (IPNI). See below for more. New function resolve() that unifies name resolution services from iPlant's name resolution service...

rOpenHack report

Karthik Ram — May 14, 2014
The rOpenSci project is a poster child for the fluid collaboration that has become increasingly common these days thanks to platforms like Twitter and GitHub. It has been really inspring to see open discussions take shape as rough ideas, which rapidly turn into prototype research software, all of which are now happening in the order of few days to weeks rather than months to years. The origins of this project itself lead back to a...

Overlaying species occurrence data with climate data

Ted Hart — April 22, 2014
One of the goals of the rOpenSci is to facilitate interoperability between different data sources around web with our tools. We can achieve this by providing functionality within our packages that converts data coming down via web APIs in one format (often a provider specific schema) into a standard format. The new version of rWBclimate that we just posted to CRAN does just that. In an earlier post I wrote about how users could combine...

Make your ggplots shareable, collaborative, and with D3

Matt Sundquist — April 17, 2014
Editor's note: This is a guest post by Matt Sundquist from Plot.ly. You can access the source code for this post at https://gist.github.com/sckott/10991885 Ggplotly and Plotly's R API let you make ggplot2 plots, add py$ggplotly(), and make your plots interactive, online, and drawn with D3. Let's make some. 1. Getting Started and Examples Here is Fisher's iris data. library("ggplot2") ggiris <- qplot(Petal.Width, Sepal.Length, data = iris, color = Species) print(ggiris) Let's make it in Plotly....

Topic Modeling In R

Carson Sievert — April 16, 2014
Editor's note: This is the first in a series of posts from rOpenSci's recent hackathon. I recently had the pleasure of participating in rOpenSci's hackathon. To be honest, I was quite nervous to work among such notables, but I immediately felt welcome thanks to a warm and personable group. Alyssa Frazee has a great post summarizing the event, so check that out if you haven't already. Once again, many thanks to rOpenSci for making it...

The ins and outs of interacting with web APIs

Core Team — April 14, 2014
We've received a number of questions from our users about dealing with the finer details of data sources on the web. Whether you're reading data from local storage such as a csv file, a .Rdata store, or possibly a proprietary file format, you've most likely run into some issues in the past. Common problems include passing incorrect paths, files being too big for memory, or requiring several packages to read files in incompatible formats. Reading...

Accessing iNaturalist data

Ted Hart — March 26, 2014
The iNaturalist project is a really cool way to both engage people in citizen science and collect species occurrence data. The premise is pretty simple, users download an app for their smartphone, and then can easily geo reference any specimen they see, uploading it to the iNaturalist website. It let's users turn casual observations into meaningful crowdsourced species occurrence data. They also provide a nice robust API to access almost all of their data. We've...

Species occurrence data

March 17, 2014
UPDATE: mapping functions are in a separate package now (mapr). Examples that do mapping below have been updated. The rOpenSci projects aims to provide programmatic access to scientific data repositories on the web. A vast majority of the packages in our current suite retrieve some form of biodiversity or taxonomic data. Since several of these datasets have been georeferenced, it provides numerous opportunities for visualizing species distributions, building species distribution maps, and for using it...

rnoaa - Access to NOAA National Climatic Data Center data

Scott Chamberlain — March 13, 2014
We recently pushed the first version of rnoaa to CRAN - version 0.1. NOAA has a lot of data, some of which is provided via the National Climatic Data Center, or NCDC. NOAA has provided access to NCDC climate data via a RESTful API - which is great because people like us can create clients for different programming languages to access their data programatically. If you are so inclined to write a bit of R...

dvn - Sharing Reproducible Research from R

Thomas Leeper — February 20, 2014
Reproducible research involves the careful, annotated preservation of data, analysis code, and associated files, such that statistical procedures, output, and published results can be directly and fully replicated. As the push for reproducible research has grown, the R community has responded with an increasingly large set of tools for engaging in reproducible research practices (see, for example, the ReproducibleResearch Task View on CRAN). Most of these tools focus on improving one's own workflow through closer...

New features in the most recent taxize update, v0.2

Scott Chamberlain — February 19, 2014
We just released a new version of taxize - version 0.2.0. This release contains a number of new features, and bug fixes. Here is a run down of some of the changes: First, install and load taxize install.packages("rgbif") library(taxize) New things New functions: class2tree Sometimes you just want to have a visual of the taxonomic relationships among taxa. If you don't know how to build a molecular phylogeny, don't have time, or there just isn't...

AntWeb - programmatic interface to ant biodiversity data

Karthik Ram — February 18, 2014
This post was updated on August 20, 2014, with AntWeb version 0.7.2.99. Please install an updated version to make sure the code works. Data on more than 10,000 species of ants recorded worldwide are available through from California Academy of Sciences' AntWeb, a repository that boasts a wealth of natural history data, digital images, and specimen records on ant species from a large community of museum curators. Digging through some of the earliest announcements of...

Changed and new things in the new version of rgbif, v0.5

Scott Chamberlain — February 17, 2014
rgbif is an R package to search and retrieve data from the Global Biodiverity Information Facilty (GBIF). rgbif wraps R code around the [GBIF API][gbifapi] to allow you to talk to GBIF from R. We just pushed a new verion of rgbif to cran - v0.5.0. Source and binary files are now available on CRAN. There are a few new functions: count_facet, elevation, and installations. These are described, with examples, below. Functions to work with...

Caching Encyclopedia of Life API calls

Scott Chamberlain — February 12, 2014
In a recent blog post we discussed caching calls to the web offline, on your own computer. Just like you can cache data on your own computer, a data provider can do the same thing. Most of the data providers we work with do not provide caching. However, at least one does: EOL, or Encyclopedia of Life. EOL allows you to set the amount of time (in seconds) that the call is cached, within which...

rOpenSci developer meeting in March

Karthik Ram — February 10, 2014
Our team has been cranking out a large number of tools over the past several months. As regular readers are aware, our software packages provide programmatic access to a diverse and extensive trove of scientific data. More recently we’ve expanded our efforts to build more general purpose and cross-domain tools. These include tools for reading, writing, integrating and publishing data, a unit testing platform for data, and a mapping engine that can visualize various kinds...

Caching API calls offline

Scott Chamberlain — February 3, 2014
I've recently heard the idea of "offline first" via especially Hood.ie. We of course don't do web development, but primarily build R interfaces to data on the web. Internet availablility is increasinghly ubiqutous, but there still are times and places where you don't have internet, but need to get work done. In the R packages we write there are generally two steps to every workflow: Make a call to the web to request data and...

Introducing the ecoengine package

Karthik Ram — January 30, 2014
Natural history museums have long been valuable repositories of data on species diversity. These data have been critical for fostering and shaping the development of fields such as biogeography and systematics. The importance of these data repositories is becoming increasingly important, especially in the context of climate change, where a strong understanding of how species responded to past climate is key to understanding their responses in the future. Leading the way in opening up such...

solr - an R interface to Solr

Scott Chamberlain — January 27, 2014
A number of the APIs we interact with (e.g., PLOS full text API, and USGS's BISON API in rplos and rbison, respectively) expose Solr endpoints. Solr is an Apache hosted project - it is a powerful search server. Given that at least two, and possibly more in the future, of the data providers we interact with provide Solr endpoints, it made sense to create an R package to make robust functions to interact with Solr...

Highlighting text in text mining

Scott Chamberlain — December 2, 2013
rplos is an R package to facilitate easy search and full-text retrieval from all Public Library of Science (PLOS) articles, and we have a little feature which aren't sure if is useful or not. I don't actually do any text-mining for my research, so perhaps text-mining folks can give some feedback. You can quickly get a lot of results back using rplos, so perhaps it is useful to quickly browse what you got. What better...

Open Science with R

Karthik Ram — December 2, 2013
Upcoming Book on Open Science with R We're pleased to announce that the rOpenSci core team has just signed a contract with CRC Press/Taylor and Francis R series to publish a new book on practical ways to implement open science into your own research using R. Given all the talk about the importance of open science, the discussion often lacks practical suggestions on how one might actually incorporate these practices into their day to day...

rgbif changes in v0.4

Scott Chamberlain — November 21, 2013
The Global Biodiversity Information Facility (GBIF) is a warehouse of species occurrence data - collecting data from a lot of different sources. Our package rgbif allows you to interact with GBIF from R. We interact with GBIF via their Application Programming Interface, or API. Our last version on CRAN (v0.3) interacted with the older version of their API - this version interacts with the new version of their API. However, we also retained functions that...

taxize changes

Scott Chamberlain — November 19, 2013
We are building a taxonomic toolbelt for R called taxize - which gives you programmatic access to many sources of taxonomic data on the web. We just pushed a new version to CRAN (v0.1.5) with a lot of changes (see here for a rundown). Here are a few highlights of the changes. Note: the windows binary may not be available yet... Install and load taxize install.packages("taxize") library(taxize) Taxonomic identifiers Each taxonomic service has their own...

Species occurrence data to CartoDB

Scott Chamberlain — November 4, 2013
We have previously written about creating interactive maps on the web from R, with the interactive maps on Github. See here, here, here, and here. A different approach is to use CartoDB, a freemium service with sql interface to your data tables that provides a map to visualize data in those tables. They released an R interace to their sql API on Github here - which we can use to make an interactive map from...

Interactive maps with polygons using R, Geojson, and Github

Scott Chamberlain — October 23, 2013
Previously on this blog we have discussed making geojson maps and uploading to Github for interactive visualization with USGS BISON data, and with GBIF data, and on my own personal blog. This is done using a file format called geojson, a file format based on JSON (JavaScript Object Notation) in which you can specify geographic data along with any other metadata. In two the previous posts about geojson, I described how you could get data...

OA week - A simple use case for programmatic access to PLOS full text

Scott Chamberlain — October 22, 2013
Open access week is here! We love open access, and think it's extremely important to publish in open access journals. One of the many benefits of open access literature is that we likely can use the text of articles in OA journals for many things, including text-mining. What's even more awesome is some OA publishers provide API (application programming interface) access to their full text articles. Public Library of Science (PLOS) is one of these....

Altmetrics workshop recap

Scott Chamberlain — October 15, 2013
I attended the recent ALM Workshop 2013 and data challenge hosted by Public Library of Science (PLOS) in San Francisco. The workshop covered various issues having to do with altmetrics, or article-level metrics (ALM). The same workshop last year definitely had a feeling of we don't know x, y, and z, while the workshop this year felt like we know a lot more. There were many great talks - you can see the list of...

Guide to using rOpenSci packages during the US Gov't shutdown

Scott Chamberlain — October 8, 2013
With the US government shut down, many of the federal government provided data APIs are down. We write R packages to interact with many of these APIs. We have been tweeting about what APIs that are down related to R pacakges we make, but we thought we would write up a proper blog post on the issue. NCBI services are still up! NCBI is within NIH, which is within the Department of Health and Human...

Web Technologies and Services taskview is up on CRAN

Scott Chamberlain — October 3, 2013
Just a quick note that the Task View we have been working on with others Web Technologies and Services is up on CRAN now. Find it here http://cran.r-project.org/web/views/WebTechnologies.html. This is the first version - there are definitely changes to come. Changes are being suggested as I write this on Twitter... The draft version of the task view is on Github here if you want to file an issue. We use many packages to do stuff...

A new tutorials setup

Scott Chamberlain — October 3, 2013
To help you use rOpenSci packages we put tutorials up on our site at http://ropensci.org/tutorials. Up to now, we created them with combination of raw html + converting code blocks to html and inserting them, etc. -- it was a slow process to update them when changes happened in our packages. So we thought of a better plan... Recently CRAN started accepting R package vignettes (basically, tutorials built in to packages) in R Markdown format....

A task view for interacting with the web from R

Scott Chamberlain — September 11, 2013
There is an increasing set of R packages for interacting with the web from R, whether it be the low level tools to interact with the web via http (see RCurl and httr), parsing data from the web (like RJSONIO and XML), or wrappers to web APIs that provide data (like twitteR). Most of you probably know about CRAN Task Views that aggregate information about R packages and functions on a particular subject area into...

Use cases as an interface to tool discovery

Scott Chamberlain — September 10, 2013
Good discovery tools for sotware are important as they can facilitate the pace of software development, bugs are found and squashed and new features added more quickly, and users find software they need faster. We have a page on our website for our packages that provides an overview of the packages we have, with descriptions and links. Two other ways to discover things include A gallery of examples, or use cases, in which the entry...

Working with climate data from the web in R

Scott Chamberlain — August 18, 2013
I recently attended ScienceOnline Climate, a conference in Washington, D.C. at AAAS. You may have heard of the ScienceOnline annual meeting in North Carolina - this was one of their topical meetings focused on Climate Change. I moderated a session on working with data from the web in R, focusing on climate data. Search Twitter for #scioClimate for tweets from the conference, and #sciordata for tweets from the session I ran. The following is an...

NOAA climate sparklines

Scott Chamberlain — August 5, 2013
We have started a new R package interacting with NOAA climate data called rnoaa. You can find our package in development here and documentation for NOAA web services here. It is still early days for this package, but we wanted to demo what you can do with the package. In this example, we search for stations that collect climate data, then get the data for those stations, pull out only the precipitation data, then get...

Consuming article-level metrics

Scott Chamberlain — August 1, 2013
We recently had a paper come out in a special issue on article-level metrics in the journal Information Standards Quarterly. Our paper basically compared article-level metrics provided by different aggregators. The other papers covered various article-level metrics topics from folks at PLOS, Mendeley, and more. Get our paper here. To get data from the article-level metrics providers we used one R package we created to get DOIs for PLOS articles (rplos) and three R packages...

Overlaying climate data with species occurrence data

Ted Hart — July 29, 2013
One of our primary goals at ROpenSci is to wrap as many science API's as possible. While each package can be used as a standalone interface, there's lots of ways our packages can overlap and complement each other. Sure He-Man usually rode Battle Cat, but there's no reason he couldn't ride a my little pony sometimes too. That's the case with our packages for GBIF and the worldbank climate data api. Both packages will give...

rOpenSci at ESA 2013

Karthik Ram — July 29, 2013
It's the last week in July and this means that ecologists across North America (and elsewhere) are busy returning from the field and preparing their presentations and posters in anticipation of the annual Ecological Society of America meeting. The entire rOpenSci dev team will be in attendance this year and we have several workshops, talks, and events planned out. The topics range from half-day workshops on open data, data visualization, reproducible research, to an entire...

Making maps of climate change

Ted Hart — July 19, 2013
A recent video on the PBS Ideas Channel posited that the discovery of climate change is humanities greatest scientific achievement. It took synthesizing generations of data from thousands of scientists, hundreds of thousands (if not more) of hours of computer time to run models at institutions all over the world. But how can the individual researcher get their hands of some this data? Right now the World Bank provides access to global circulation model (GCM)...

Style GeoJSON

Scott Chamberlain — July 17, 2013
Previously on this blog and on my own personal blog, I have discussed how easy it is to create interactive maps on Github using a combination of R, git and Github. This is done using a file format called geojson, a file format based on JSON (JavaScript Object Notation) in which you can specify geographic data along with any other metadata. In my previous post on this blog about geojson, I described how you could...

From occurrence data to interactive maps on the web

Scott Chamberlain — July 4, 2013
We have a number of packages for getting species occurrence data: rgbif and rbison. The power of R is that you can pull down this occurrence data, manipulate the data, do some analyses, and visualize the data - all in one open source framework. However, when dealing with occurrence data on maps, it is often useful to be able to interact with the visualization. Github, a code hosting and collaboration site, now renders a particular...

Revisiting our USGS app

Scott Chamberlain — June 19, 2013
R has a reputation of not playing nice on the web. At rOpenSci, we write R pacakages to bring data from around the web into R on your local machine - so we mostly don't do any dev for the web. However, the United States Geological Survey (USGS) recenty held an app competition - it was a good opportunity to play with R on the web. We won best overall app as described in an...

What we hope to accomplish with the new funding

Core Team — June 14, 2013
At rOpenSci's virtual HQ we're busy planning out several exciting projects for the coming year thanks to the generous 180k grant from Sloan. In the interest of maintaining transparency with our community here are additional details of what we hope to accomplish and how we'll measure our successes. We have also posted a full copy of our proposal over at figshare. Objectives for the year a) Focus on identifying shortcomings, strengthening our core products, and...

rOpenSci awarded 180K from The Sloan Foundation

Karthik Ram — June 12, 2013
Today we are pleased to announce that rOpenSci has been awarded a generous 180K grant from the Alfred P. Sloan foundation. This funding will allow us to develop a whole new suite of tools and provide scientists with general purpose toolkits to access various kinds of scientific data. We will also be traveling a whole bunch this year and running workshops at several conferences and universities. If you'd like us to speak to your research...

BISON USGS species occurrence data

Scott Chamberlain — May 27, 2013
The USGS recently released a way to search for and get species occurrence records for the USA. The service is called BISON (Biodiversity Information Serving Our Nation). The service has a web interface for human interaction in a browser, and two APIs (application programming interface) to allow machines to interact with their database. One of the APIs allows you to search and retrieve data, and the other gives back maps as either a heatmap or...

rOpenSci updates on packages and the website

Scott Chamberlain — May 20, 2013
We've been busy We have been busy hacking away at code and our website. Here is an update on what we've been up to. Packages rplos/alm PLoS provides two different API services: the Search API and ALM API. As their names suggest, the search API lets you search and get text from their papers and associated metadata. The ALM API allows you to get article level metrics data on PLoS papers. Up until a few...

Facilitating Open Science with Python

Steve Moss — May 16, 2013
A guest blog post by Steve Moss Why Python? A little background! I started using Python in the summer of 2010. I had applied for the Master of Research postgraduate degree in Computational Biology at the University of York. They teach the programming portion of their course using Python. I thought it might be useful to learn it, before starting, to give me a bit of a head start. From the beginning, it was clear...

Introducing the BEFData package

Karthik Ram — May 10, 2013
This is a guest post by Class-Thido Pfaff We here present the BEFdata R package as part of the rOpenSci project. It is an API package that combines the strengths of the BEFdata portal in handling small, complex datasets with the powerful statics package R. The portal itself is free software as well and can be found here. The BEFdata platforms support interdisciplinary data sharing and harmonisation of distributed research projects collaborating with each other....

USGS App Contest

Scott Chamberlain — April 22, 2013
Many US federal agencies are now running app competitions to highlight their web services (see here), and hopefully get people to build cool stuff using government data (see Data.gov for more). See here for a nice list of the US government's web services. One of these agencies was the United States Geological Survey (USGS). They opened up an app competition and we won best overall app! Check out our app called TaxaViewer here: http://glimmer.rstudio.com/ropensci/usgs_app/. We...

Use case - how to get species occurrence data from GBIF for a genus

Scott Chamberlain — April 12, 2013
Real use cases from people using our software are awesome. They are important for many reasons: 1) They make the code more useable because we may change code to make the interace and output easier to understand; 2) They may highlight bugs in our code; and 3) They show us what functions users care the most about (if we can assume number of questions equates to use). If someone has a question, others are likely...

Scholarly metadata in R

Scott Chamberlain — March 15, 2013
Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc. One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH. Many publishers, provide their metadata through their own endpoint, and implement the standard OAI-PMH methods: GetRecord, Identify,...

Visualizing rOpenSci collaboration

Scott Chamberlain — March 8, 2013
We have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a Github repository, and when they did it. Then we can visualize this hitorical record. Install some functions for...

is.invasive()

Scott Chamberlain — November 26, 2012
The following is a guest post from Ignasi Bartomeus, originally posted on his blog on 26 Nov, 2012. Check out a related blog post here. Note the functionality discussed in this post is now in our taxize package under the function gisd_isinvasive. We hacked out a quick Shiny app so you can play around with the below function in taxize on the web to get invasive status and plot it on a phylogeny. Check it...