At rOpenSci, we create and curate software to help scientists with the data life cycle. These tools access, download, manage, and archive scientific data in open, reproducible ways. Early on, we realized this could only be a community effort. The variety of scientific data and workflows could only be tackled by drawing on contributions of scientists with field-specific expertise.
With the community approach came challenges. How could we ensure the quality of code written by scientists without formal training in software development practices? How could we drive adoption of best practices among our contributors? How could we create a community that would support each other in this work?
...Are you thinking about submitting a package to rOpenSci’s open peer software review? Considering volunteering to review for the first time? Maybe you’re an experienced package author or reviewer and have ideas about how we can improve.
Join our Community Call on Wednesday, September 13th. We want to get your feedback and we’d love to answer your questions!
Agenda
Speaker bios
Andee Kaplan is a Postdoctoral Fellow at Duke University. She is a recent PhD graduate from the Iowa State University Department of Statistics, where she learned a lot about R and reproducibility by developing a class on data stewardship for Agronomists. Andee has reviewed multiple (two!) packages for rOpenSci, iheatmapr
and getlandsat
, and hopes to one day be on the receiving end of the review process.
As you might remember from my blog post about ropenaq
, I work as a data manager and statistician for an epidemiology project called CHAI for Cardio-vascular health effects of air pollution in Telangana, India. One of our interests in CHAI is determining exposure, and sources of exposure, to PM2.5 which are very small particles in the air that have diverse adverse health effects. You can find more details about CHAI in our recently published protocol paper. In this blog post that partly corresponds to the content of my useR! 2017 lightning talk, I’ll present a package we wrote for dealing with the output of a scientific device, which might remind you of similar issues in your experimental work....
The package FedData
has gone through software review and is now part of rOpenSci. FedData
includes functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government).
Currently, the package enables extraction from six datasets:
FedData
is designed with the large-scale geographic information system (GIS) use-case in mind: cases where the use of dynamic web-services is impractical due to the scale (spatial and/or temporal) of analysis. It functions primarily as a means of downloading tiled or otherwise spatially-defined datasets; additionally, it can preprocess those datasets by extracting data within an area of interest (AoI), defined spatially. It relies heavily on the sp
, raster
, and rgdal
packages.
Contributing to an open-source community without contributing code is an oft-vaunted idea that can seem nebulous. Luckily, putting vague ideas into action is one of the strengths of the rOpenSci Community, and their package onboarding system offers a chance to do just that.
This was my first time reviewing a package, and, as with so many things in life, I went into it worried that I’d somehow ruin the package-reviewing process— not just the package itself, but the actual onboarding infrastructure…maybe even rOpenSci on the whole.
...