rOpenSci | rOpenSci News Digest, March 2023

rOpenSci News Digest, March 2023

Dear rOpenSci friends, it’s time for our monthly news roundup!

You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci!

🔗 rOpenSci HQ

🔗 Meeting the stars of the R-universe: Sébastien Rochette

Knowing our community’s stories helps us to learn about the people behind our software, brings us closer and offers us new opportunities. To share some of these community stories, we created the rOpenSci interview series “Meeting the stars of the R-Universe”.

The latest interview with Sébastien Rochette introduces ThinkR’s Approach to Contributing to a Growing and Friendly R Community. The post is available in Spanish and French too! Don’t miss the trilingual post and the video.

🔗 Discovering and learning everything there is to know about R packages using R-universe

Jeroen Ooms explains how to use R-universe to discover and assess new packages. He wrote that we can distinguish three levels of navigation in the R-universe when you go shopping for R packages:

  1. Search the global ecosystem: find packages, by topic, keyword, ranking, etc.
  2. Browse by maintainer/organization: explore all work from a given group or developer.
  3. The individual package: get detailed information on everything there is to know about a project and instructions for how to start using it.

That post was also discussed on the R Weekly highlights podcast hosted by Eric Nantz and Mike Thomas!

🔗 Coworking

Join us for social coworking & office hours monthly on first Tuesdays! Hosted by Steffi LaZerte and various community hosts. Everyone welcome. No RSVP needed. Consult our Events page to find your local time and how to join.

And remember, you can always cowork independently on work related to R, work on packages that tend to be neglected, or work on what ever you need to get done!

  • Tuesday, May 2nd, 9:00 Americas Pacific / 16:00 UTC Tentative theme: “Spring Cleaning for R packages and scripts” Hosted by community host TBD and Steffi LaZerte
    • Explore how other organizations keep their scripts/packages nice and clean
    • Take a look at your R packages and scripts and give them a good spring cleaning*
    • Talk to our community host and other attendees and discuss tips for keeping on top of it all.

* in the northern hemisphere at least, otherwise, give them a good fall cleaning!

🔗 Software 📦

🔗 New packages

The following four packages recently became a part of our software suite:

  • openalexR, developed by Massimo Aria together with Trang Le: A set of tools to extract bibliographic content from OpenAlex database using API https://docs.openalex.org. It is available on CRAN. It has been reviewed by Brianna Lind and Pachá (aka Mauricio Vargas Sepúlveda).

  • rb3, developed by Wilson Freitas together with Marcelo Perlin: Download and parse public files released by B3 and convert them into useful formats and data structures common to data analysis practitioners. It is available on CRAN. It has been reviewed by Mario Gavidia Calderón and Pachá (aka Mauricio Vargas Sepúlveda).

  • tsbox, developed by Christoph Sax: Time series toolkit with identical behavior for all time series classes: ts,xts, data.frame, data.table, tibble, zoo, timeSeries, tsibble, tis or irts. Also converts reliably between these classes. It is available on CRAN. It has been reviewed by Cathy Chamberlin, and Nunes Matt.

  • waywiser, developed by Michael Mahoney: Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with “hot spots” of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the tidymodels framework. It is available on CRAN. It has been reviewed by Virgilio Gómez-Rubio, and Jakub Nowosad.

Discover more packages, read more about Software Peer Review.

🔗 New versions

The following fifteen packages have had an update since the last newsletter: c14bazAAR (3.4.1), dynamite (1.2.0), FedData (v3.0.3), geojsonio (v0.11.0), lingtypology (v1.1.12), mctq (v0.3.2), osmdata (v0.2.1), pathviewr (v1.1.7), qualR (v0.9.7), rredlist (v0.7.1), spocc (v1.2.1), tarchetypes (0.7.5), targets (0.14.3), webmockr (v0.9.0), and xslt (v1.4.4).

🔗 Software Peer Review

There are fifteen recently closed and active submissions and 2 submissions on hold. Issues are at different stages:

Find out more about Software Peer Review and how to get involved.

🔗 On the blog

🔗 Other topics

🔗 Tech Notes

🔗 Call for (co)maintainers

🔗 Call for maintainers

If you’re interested in maintaining any of the R packages below, you might enjoy reading our blog post What Does It Mean to Maintain a Package? (or listening to its discussion on the R Weekly highlights podcast hosted by Eric Nantz and Mike Thomas)!

  • rvertnet, Retrieve, map and summarize data from the VertNet.org archives (http://vertnet.org/). Functions allow searching by many parameters, including taxonomic names, places, and dates. In addition, there is an interface for conducting spatially delimited searches, and another for requesting large datasets via email. Issue for volunteering.

  • natserv. Interface to NatureServe (https://www.natureserve.org/). Includes methods to get data, image metadata, search taxonomic names, and make maps. Issue for volunteering.

  • sofa. Provides an interface to the NoSQL database CouchDB (http://couchdb.apache.org). Methods are provided for managing databases within CouchDB, including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local CouchDB instance, or a remote ‘CouchDB’ databases such as Cloudant. Documents can be inserted directly from vectors, lists, data.frames, and JSON. Targeted at CouchDB v2 or greater. Issue for volunteering.

  • geojsonlint, Tools for linting GeoJSON. Includes tools for interacting with the online tool http://geojsonlint.com, the Javascript library geojsonhint (https://www.npmjs.com/package/geojsonhint), and validating against a GeoJSON schema via the Javascript library (https://www.npmjs.com/package/is-my-json-valid). Some tools work locally while others require an internet connection. Issue for volunteering.

  • citesdb, a high-performance database of shipment-level CITES trade data. Provides convenient access to over 40 years and 20 million records of endangered wildlife trade data from the Convention on International Trade in Endangered Species of Wild Fauna and Flora, stored on a local on-disk, out-of memory ‘DuckDB’ database for bulk analysis. Issue for volunteering.

🔗 Call for comaintainers

🔗 Package development corner

Some useful tips for R package developers. 👀

🔗 R Consortium’s call for proposals!

The R Consortium’s Internal Steering Committee has a call for proposals open until April 1st.

  • The funds can be used for different sizes of projects. The project must have a software development component.
  • Proofs of concept are not funded. No scientific publications or equipment.
  • The idea is that the funds can cover people’s time to develop software.

This might be relevant for your R package work so make sure to read the call, and good luck if you send a proposal! 🚀

🔗 To cache, or not to cache testthat results?

Have you ever wished you could cache testthat results? You’ll find arguments both in favor of and against that idea in this testthat issue – testthat maintainer Hadley Wickham being against the idea.

You might be interested in Kirill Müller’s experimental package lazytest that helps you rerun only the tests that have failed during the last run.

🔗 Check if an R package name is available

The function pak::pkg_name_check() by Gábor Csárdi can be viewed as a replacement for the available package. It has a very nice output. (Also keep in mind that our pkgcheck::pkgcheck() function reports on potentially duplicated function names.)

🔗 What if your httptest mock files are suddenly ignored?

Imagine you’ve set up HTTP testing in your package with httptest and all goes well until one day, where the httptest mock files are ignored. Don’t panic! Check whether the calls that are mocked are still made with httr. Maybe one of your package’s dependencies upgraded their stack? If the calls are made with httr2, the tests need to be updated to httptest2 which thankfully isn’t too hard.

🔗 Updates to package checks

We added one new check this month to our pkgcheck system, specifically for statistics packages. Standards are expected to be documented with the srr package throughout the entire code of a package, including within all or most files in the /R and /tests directories. Having documentation distributed throughout code is particularly important to enable reviewers to judge compliance with standards at the relevant locations within the code. Packages which leave a large portion of standards documentation in a default location within a single file now produce an error when checked with pkgcheck, as well as with the srr function, srr_stats_pre_submit.

🔗 Last words

Thanks for reading! If you want to get involved with rOpenSci, check out our Contributing Guide that can help direct you to the right place, whether you want to make code contributions, non-code contributions, or contribute in other ways like sharing use cases.

If you haven’t subscribed to our newsletter yet, you can do so via a form. Until it’s time for our next newsletter, you can keep in touch with us via our website and Mastodon account.