Tools for converting QuadKey-identified datasets (Microsoft's Bing Maps Tile System) into raster images and analyzing Meta (Facebook) Mobility Data
Quadkeyr functions generate raster images based on QuadKey-identified data, facilitating efficient integration of Tile Maps data into R workflows. In particular, Quadkeyr provides support to process and analyze Facebook mobility datasets within the R environment.
Allows users to access the Oregon State Prism climate data ( Using the web service API data can easily downloaded in bulk and loaded into R for spatial analysis. Some user friendly visualizations are also provided.
Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) DOI:10.1080/10618600.2023.2231048.
Static code analyses for R packages using the external code-tagging libraries ctags and gtags. Static analyses enable packages to be analysed very quickly, generally a couple of seconds at most. The package also provides access to a database generating by applying the main function to the full CRAN archive, enabling the statistical properties of any package to be compared with all other CRAN packages.
General purpose TIFF file I/O for R users. Currently the only such package with read and write support for TIFF files with floating point (real-numbered) pixels, and the only package that can correctly import TIFF files that were saved from ImageJ and write TIFF files than can be correctly read by ImageJ Also supports text image I/O.
Simplifies the creation of reproducible data science environments using the Nix package manager, as described in Dolstra (2006) <ISBN 90-393-4130-3>. The included rix()
function generates a complete description of the environment as a default.nix
file, which can then be built using Nix. This results in project specific software environments with pinned versions of R, packages, linked system dependencies, and other tools. Additional helpers make it easy to run R code in Nix software environments for testing and production.
Pipeline tools coordinate the pieces of computationally demanding analysis projects. The targets package is a Make-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU Make (2015, ISBN:978-9881443519) and drake (2018, doi:10.21105/joss.00550).
Easy-to-use and efficient interface for Bayesian inference of complex panel (time series) data using dynamic multivariate panel models by Helske and Tikka (2024) doi:10.1016/j.alcr.2024.100617. The package supports joint modeling of multiple measurements per individual, time-varying and time-invariant effects, and a wide range of discrete and continuous distributions. Estimation of these dynamic multivariate panel models is carried out via Stan. For an in-depth tutorial of the package, see (Tikka and Helske, 2024) doi:10.48550/arXiv.2302.01607.
Metrics for your code repository. Call one function to generate an interactive dashboard displaying the state of your code.
Provides automated downloading, parsing and formatting of weather data for Australia through API endpoints provided by the Department of Primary Industries and Regional Development (DPIRD) of Western Australia and by the Science and Technology Division of the Queensland Governments Department of Environment and Science (DES). As well as the Bureau of Meteorology (BOM) of the Australian government precis and coastal forecasts, and downloading and importing radar and satellite imagery files. DPIRD weather data are accessed through public APIs provided by DPIRD,, providing access to weather station data from the DPIRD weather station network. Australia-wide weather data are based on data from the Australian Bureau of Meteorology (BOM) data and accessed through SILO (Scientific Information for Land Owners) Jeffrey et al. (2001) doi:10.1016/S1364-8152(01)00008-1. DPIRD data are made available under a Creative Commons Attribution 3.0 Licence (CC BY 3.0 AU) license SILO data are released under a Creative Commons Attribution 4.0 International licence (CC BY 4.0) BOM data are (c) Australian Government Bureau of Meteorology and released under a Creative Commons (CC) Attribution 3.0 licence or Public Access Licence (PAL’) as appropriate, see for further details.
This package computes and visualizes wildfire exposure using the methods documented in a series of scientific publications.
Bindings to ImageMagick: the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.
Function-oriented Make-like declarative pipelines for Statistics and data science are supported in the targets R package. As an extension to targets, the tarchetypes package provides convenient user-side functions to make targets easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the targets R package. by Will Landau (2018) doi:10.21105/joss.00550.
Download large sections of GenBank and generate a local SQL-based database. A user can then query this database using restez functions or through rentrez wrappers.
BEAST2 ( is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAUti 2 (which is part of BEAST2) is a GUI tool that allows users to specify the many possible setups and generates the XML file BEAST2 needs to run. This package provides a way to create BEAST2 input files without active user input, but using R function calls instead.
BEAST2 ( is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is a command-line tool. This package provides a way to call BEAST2 from an R function call.
Integrates population dynamics and dispersal into a mechanistic virtual species simulator. The package can be used to study the effects of environmental change on population growth and range shifts. It allows for simple and straightforward definition of population dynamics (including positive density dependence), extensive possibilities for defining dispersal kernels, and the ability to generate virtual ecologist data. Learn more about the rangr at
Provides a convenient API interface to access immunological data within the CAVD DataSpace(, a data sharing and discovery tool that facilitates exploration of HIV immunological data from pre-clinical and clinical HIV vaccine studies.
Intended to facilitate acoustic analysis of (animal) sound propagation experiments, which typically aim to quantify changes in signal structure when transmitted in a given habitat by broadcasting and re-recording animal sounds at increasing distances. The package offers a workflow with functions to prepare the data set for analysis as well as to calculate and visualize several degradation metrics, including blur ratio, signal-to-noise ratio, excess attenuation and envelope correlation among others (Dabelsteen et al 1993 doi:10.1121/1.406682).
allows users to access The Federal Emergency Management Agencys (FEMA) publicly available data through their API. The package provides a set of functions to easily navigate and access data from the National Flood Insurance Program along with FEMAs various disaster aid programs, including the Hazard Mitigation Grant Program, the Public Assistance Grant Program, and the Individual Assistance Grant Program.
Provide workflows and guidance for automatic translation of Markdown-based R content using DeepL API.
Import, process, summarize and visualize raw data from metabolic carts. See Robergs, Dwyer, and Astorino (2010) doi:10.2165/11319670-000000000-00000 for more details on data processing.
The Citation File Format version 1.2.0 doi:10.5281/zenodo.5171937 is a human and machine readable file format which provides citation metadata for software. This package provides core utilities to generate and validate this metadata.
Parsing (R)Markdown files with numerous regular expressions can be fraught with peril, but it does not have to be this way. Converting (R)Markdown files to XML using the commonmark package allows in-memory editing via of markdown elements via XPath through the extensible R6 class called yarn. These modified XML representations can be written to (R)Markdown documents via an xslt stylesheet which implements an extended version of GitHub-flavoured markdown so that you can tinker to your hearts content.
Convert between anthropometric measures and z-scores/centiles in multiple growth standards, and classify fetal, newborn, and child growth accordingly. With a simple interface to growth standards from the World Health Organisation and International Fetal and Newborn Growth Consortium for the 21st Century, gigs makes growth assessment easy and reproducible for clinicians, researchers and policy-makers.
An API client for NASA POWER global meteorology, surface solar energy and climatology data API. POWER (Prediction Of Worldwide Energy Resources) data are freely available for download with varying spatial resolutions dependent on the original data and with several temporal resolutions depending on the POWER parameter and community. This work is funded through the NASA Earth Science Directorate Applied Science Program. For more on the data themselves, the methodologies used in creating, a web- based data viewer and web access, please see
Accesses Weather Data from the Iowa Environment Mesonet
View DocumentationWorking with Sets the Tidy Way
View DocumentationrOpenSci package review project template
View DocumentationBespoke Images of OpenStreetMap Data
View DocumentationAn R package to download São Paulo and Rio de Janeiro air pollution data
View DocumentationGetting Bibliographic Records from OpenAlex Database Using DSL API
View DocumentationrOpenSci Package Checks
View DocumentationPlumber API to report package structure and function
Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with “hot spots” of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the tidymodels framework. Methods include Morans I (Moran (1950) doi:10.2307/2332142), Gearys C (Geary (1954) doi:10.2307/2986645), Getis-Ords G (Ord and Getis (1995) doi:10.1111/j.1538-4632.1995.tb00912.x), agreement coefficients from Ji and Gallo (2006) ([doi: 10.14358/PERS.72.7.823]( 10.14358/PERS.72.7.823)), agreement metrics from Willmott (1981) ([doi: 10.1080/02723646.1981.10642213]( 10.1080/02723646.1981.10642213)) and Willmott et al. (2012) ([doi: 10.1002/joc.2419]( 10.1002/joc.2419)), an implementation of the area of applicability methodology from Meyer and Pebesma (2021) (doi:10.1111/2041-210X.13650), and an implementation of multi-scale assessment as described in Riemann et al’. (2010) (doi:10.1016/j.rse.2010.05.010).
Download and parse public files released by B3 and convert them into useful formats and data structures common to data analysis practitioners.
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Provide functionality to download archives (backups) for all repositories in a GitHub organization (useful for backups!).
Geospatial data computation is parallelized by grid, hierarchy, or raster files. Based on future and mirai parallel backends, terra and sf functions as well as convenience functions in the package can be distributed over multiple threads. The simplest way of parallelizing generic geospatial computation is to start from par_pad_*
functions to par_grid
, par_hierarchy
, or par_multirasters
functions. Virtually any functions accepting classes in terra or sf packages can be used in the three parallelization functions. A common raster-vector overlay operation is provided as a function extract_at
, which uses exactextractr, with options for kernel weights for summarizing raster values at vector geometries. Other convenience functions for vector-vector operations including simple areal interpolation (summarize_aw
) and summation of exponentially decaying weights (summarize_sedc
) are also provided.
Acknowledge all contributors to a project via a single function call. The function appends to a README or other specified file(s) a table with names of all individuals who contributed via code or repository issues. The package also includes several additional functions to extract and quantify contributions to any repository.
Import OpenStreetMap Data as Simple Features or Spatial Objects
Download and import of OpenStreetMap (OSM) data as sf or sp objects. OSM data are extracted from the Overpass web server ( and processed with very fast C++ routines for return to R.
Stubbing and setting expectations on HTTP requests. Includes tools for stubbing HTTP requests, including expected request conditions and response conditions. Match on HTTP method, query parameters, request body, headers and more. Can be used for unit tests or outside of a testing context.
Provides functions that automate downloading and importing University of East Anglia Climate Research Unit (CRU) CL v. 2.0 climatology data, facilitates the calculation of minimum temperature and maximum temperature and formats the data into a data.table object or a list of terra rast objects for use. CRU CL v. 2.0 data are a gridded climatology of 1961-1990 monthly means released in 2002 and cover all land areas (excluding Antarctica) at 10 arc minutes (0.1666667 degree) resolution. For more information see the description of the data provided by the University of East Anglia Climate Research Unit,
Provides a way to describe common build and deployment workflows for R-based projects: packages, websites (e.g. blogdown, pkgdown), or data processing (e.g. research compendia). The recipe is described independent of the continuous integration tool used for processing the workflow (e.g. GitHub Actions or Circle CI). This package has been peer-reviewed by rOpenSci (v0.3.0.9004).
Record test suite HTTP requests and replays them during future runs. A port of the Ruby gem of the same name ( Works by hooking into the webmockr R package for matching HTTP requests by various rules (HTTP method, URL, query parameters, headers, body, etc.), and then caching real HTTP responses on disk in cassettes. Subsequent HTTP requests matching any previous requests in the same cassette use a cached HTTP response.
An increasingly important source of health-related bibliographic content are preprints - preliminary versions of research articles that have yet to undergo peer review. The two preprint repositories most relevant to health-related sciences are medRxiv and bioRxiv, both of which are operated by the Cold Spring Harbor Laboratory. medrxivr provides programmatic access to the Cold Spring Harbour Laboratory (CSHL) API, allowing users to easily download medRxiv and bioRxiv preprint metadata (e.g. title, abstract, publication date, author list, etc) into R. medrxivr also provides functions to search the downloaded preprint records using regular expressions and Boolean logic, as well as helper functions that allow users to export their search results to a .BIB file for easy import to a reference manager and to download the full-text PDFs of preprints matching their search criteria.
Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.
Access the United States National Provider Identifier Registry API Obtain and transform administrative data linked to a specific individual or organizational healthcare provider, or perform advanced searches based on provider name, location, type of service, credentials, and other attributes exposed by the API.
Detects spatial and temporal groups in GPS relocations (Robitaille et al. (2019) doi:10.1111/2041-210X.13215). It can be used to convert GPS relocations to gambit-of-the-group format to build proximity-based social networks In addition, the randomizations function provides data-stream randomization methods suitable for GPS data.
Query different C14 date databases and apply basic data cleaning, merging and calibration steps. Currently available databases: 14cpalaeolithic, 14sea, adrac, agrichange, aida, austarch, bda, calpal, caribbean, eubar, euroevol, irdd, jomon, katsianis, kiteeastafrica, medafricarbon, mesorad, neonet, neonetatl, nerd, p3k14c, pacea, palmisano, rado.nb, rxpand, sard, xronos.
Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the stantargets R package leverages targets and cmdstanr to ease these burdens. stantargets makes it super easy to set up scalable Stan pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than targets alone. stantargets can access all of cmdstanrs major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies. For the statistical methodology, please refer to Stan’ documentation (Stan Development Team 2020)
Creating dendrochronological networks based on the similarity between tree-ring series or chronologies. The package includes various functions to compare tree-ring curves building upon the dplR package. The networks can be used to visualise and understand the relations between tree-ring curves. These networks are also very useful to estimate the provenance of wood as described in Visser (2021) DOI:10.5334/jcaa.79 or wood-use within a structure/context/site as described in Visser and Vorst (2022) DOI:10.1163/27723194-bja10014.
A programmatic interface to the web service methods provided by Global Biotic Interactions (GloBI) ( GloBI provides access to spatial-temporal species interaction records from sources all over the world. rglobi provides methods to search species interactions by location, interaction type, and taxonomic name.
Uses the node library is-my-json-valid or ajv to validate JSON against a JSON schema. Drafts 04, 06 and 07 of JSON schema are supported.
Enables preparation of maps to be printed and drawn on. Modified maps can then be scanned back in, and hand-drawn marks converted to spatial objects.
Encryption wrappers, using low-level support from sodium and openssl. cyphr tries to smooth over some pain points when using encryption within applications and data analysis by wrapping around differences in function names and arguments in different encryption providing packages. It also provides high-level wrappers for input/output functions for seamlessly adding encryption to existing analyses.
Automate rendering and cross-linking of Quarto books following a prescribed structure.
Utilities to interact with the R-universe platform. Includes functions to manage local package repositories, as well as API wrappers for retrieving data and metadata about packages in r-universe.
Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include NCBI E-utilities (, Encyclopedia of Life (, Global Biodiversity Information Facility (, and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.
A programmatic client for the eBird database (, including functions for searching for bird observations by geographic location (latitude, longitude), eBird hotspots, location identifiers, by notable sightings, by region, and by taxonomic name.
A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility (GBIF; GBIF is a database of species occurrence records from sources all over the globe. rgbif includes functions for searching for taxonomic names, retrieving information on data providers, getting species occurrence records, getting counts of occurrence records, and using the GBIF tile map service to make rasters summarizing huge amounts of data.
Provides means for downloading historical weather data from the Environment and Climate Change Canada website ( Data can be downloaded from multiple stations and over large date ranges and automatically processed into a single dataset. Tools are also provided to identify stations either by name or proximity to a location.
Parse various reflectance/transmittance/absorbance spectra file formats to extract spectral data and metadata, as described in Gruson, White & Maia (2019) doi:10.21105/joss.01857. Among other formats, it can import files from Avantes, CRAIC, and OceanOptics/OceanInsight brands.
It includes test for multivariate normality, test for uniformity on the d-dimensional Sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. For more information see Saraceno G., Markatou M., Mukhopadhyay R. and Golzy M. (2024) doi:10.48550/arXiv.2402.02290 Markatou, M. and Saraceno, G. (2024) doi:10.48550/arXiv.2407.16374, Ding, Y., Markatou, M. and Saraceno, G. (2023) doi:10.5705/ss.202022.0347, and Golzy, M. and Markatou, M. (2020) doi:10.1080/10618600.2020.1740713.
Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (GSOD) weather data from the from the USA National Centers for Environmental Information (NCEI). Units are converted from from United States Customary System (USCS) units to International System of Units (SI). Stations may be individually checked for number of missing days defined by the user, where stations with too many missing observations are omitted. Only stations with valid reported latitude and longitude values are permitted in the final data. Additional useful elements, saturation vapour pressure (es), actual vapour pressure (ea) and relative humidity (RH) are calculated from the original data using the improved August-Roche-Magnus approximation (Alduchov & Eskridge 1996) and included in the final data set. The resulting metadata include station identification information, country, state, latitude, longitude, elevation, weather observations and associated flags. For information on the GSOD data from NCEI, please see the GSOD readme.txt file available from,
Tools for working with taxonomic databases, including utilities for downloading databases, loading them into various SQL databases, cleaning up files, and providing a SQL connection that can be used to do SQL queries directly or used in dplyr.
An interface to the Integrated Taxonomic Information System (ITIS) ( Includes functions to work with the ITIS REST API methods (, as well as the Solr web service (
IUCN Red List ( client. The IUCN Red List is a global list of threatened and endangered species. Functions cover all of the Red List API routes. An API key is required.
The Codemeta Project defines a JSON-LD format for describing software metadata, as detailed at This package provides utilities to generate, parse, and modify codemeta.json files automatically for R packages, as well as tools and examples for working with codemeta.json JSON-LD more generally.
A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility (GBIF), iNaturalist, eBird, Integrated Digitized Biocollections (iDigBio), VertNet, Ocean Biogeographic Information System (OBIS), and Atlas of Living Australia (ALA). Includes functionality for retrieving species occurrence data, and combining those data.
Provides an interface to the NoSQL database CouchDB ( Methods are provided for managing databases within CouchDB, including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local CouchDB instance, or a remote CouchDB databases such as Cloudant. Documents can be inserted directly from vectors, lists, data.frames, and JSON. Targeted at CouchDB v2 or greater.
Parse messy geographic coordinates from various character formats to decimal degree numeric values. Parse coordinates into their parts (degree, minutes, seconds); calculate hemisphere from coordinates; pull out individually degrees, minutes, or seconds; add and subtract degrees, minutes, and seconds. C++ code herein originally inspired from code written by Jeffrey D. Bogan, but then completely re-written.
Client for the Open Citations Corpus ( Includes a set of functions for getting one identifier type from another, as well as getting references and citations for a given identifier.
Make fake data that looks realistic, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers (DOIs), jobs, phone numbers, DNA sequences, doubles and integers from distributions and within a range.
The OPtical TRapezoid Model (OPTRAM) derives soil moisture based on the linear relation between a vegetation index and Land Surface Temperature (LST). The Short Wave Infra-red (SWIR) band is used as a proxy for LST. See: Sadeghi, M. et al., 2017. .
Zero-dependency data frame to xlsx exporter based on libxlsxwriter Fast and no Java or Excel required.
Transform Google Docs into Quarto Books
View Documentationpkgdown template and utilities for rOpenSci docs
View DocumentationrOpenSci's blog guidance
View DocumentationUnleash Useful Linebreaks in Markdown Documents
Bindings to FFmpeg AV library for working with audio and video in R. Generates high quality video from images or R graphics with custom audio. Also offers high performance tools for reading raw audio, creating spectrograms, and converting between countless audio / video formats. This package interfaces directly to the C API and does not require any command line utilities.
Access and interrogate EMODnet (European Marine Observation and Data Network) Web Feature Service data through R.
Conjunto de funciones para calcular índices y estadísticos climáticos hidrológicos a partir de datos tidy. Incluye una función para graficar resultados georeferenciados y e información cartográfica.
Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, RNA, coding sequence (CDS), GFF, and metagenome retrieval from NCBI RefSeq, NCBI Genbank, ENSEMBL, and UniProt databases. Furthermore, an interface to the BioMart database (Smedley et al. (2009) doi:10.1186/1471-2164-10-22) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as NCBI RefSeq (Pruitt et al. (2007) doi:10.1093/nar/gkl842), NCBI nr, NCBI nt, NCBI Genbank (Benson et al. (2013) doi:10.1093/nar/gks1195), etc. with only one command.
Facilitates mapping by making natural earth map data from http:// more easily available to R users. Focuses on vector data.
A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the “Using skimr” vignette and the README.
Species trait data from many different sources, including sequence data from NCBI (, plant trait data from BETYdb, data from EOL Traitbank, Birdlife International, and more.
Geocode with the OpenCage API, either from place name to longitude and latitude (forward geocoding) or from longitude and latitude to the name and address of a location (reverse geocoding), see
Models integrate environmental DNA (eDNA) detection data and traditional survey data to jointly estimate species catch rate (see package vignette: Models can be used with count data via traditional survey methods (i.e., trapping, electrofishing, visual) and replicated eDNA detection/nondetection data via polymerase chain reaction (i.e., PCR or qPCR) from multiple survey locations. Estimated parameters include probability of a false positive eDNA detection, a site-level covariates that scale the sensitivity of eDNA surveys relative to traditional surveys, and catchability coefficients for traditional gear types. Models are implemented with a Bayesian framework (Markov chain Monte Carlo) using the Stan probabilistic programming language.
The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette(“plain_text”, package = “git2rdata”). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette(“version_control”, package = “git2rdata”). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette(“workflow”, package = “git2rdata”) gives a toy example. 4) vignette(“efficiency”, package = “git2rdata”) provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.
Explore and retrieve marine geospatial data from the Marine Regions Gazetteer and the Marine Regions Data Products
Downloads spatial data from spatiotemporal asset catalogs (STAC), computes standard spectral indices from the Awesome Spectral Indices project (Montero et al. (2023) doi:10.1038/s41597-023-02096-0) against raster data, and glues the outputs together into predictor bricks. Methods focus on interoperability with the broader spatial ecosystem; function arguments and outputs use classes from sf and terra, and data downloading functions support complex CQL2 queries using rstac.
API Client for the Climate Hazards Center CHIRPS and CHIRTS. The CHIRPS data is a quasi-global (50°S – 50°N) high-resolution (0.05 arc-degrees) rainfall data set, which incorporates satellite imagery and in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring. CHIRTS is a quasi-global (60°S – 70°N), high-resolution data set of daily maximum and minimum temperatures. For more details on CHIRPS and CHIRTS data please visit its official home page
Match, download, convert and import Open Street Map data extracts obtained from several providers.
Suite of tools for managing cached files, targeting use in other R packages. Uses rappdirs for cross-platform paths. Provides utilities to manage cache directories, including targeting files by path or by key; cached directories can be compressed and uncompressed easily to save disk space.
Interface to the libgit2 library, which is a pure C implementation of the Git core methods. Provides access to Git repositories to extract data and running some basic Git commands.
Tools to help download, process and analyse the UK road collision data collected using the STATS19 form. The datasets are provided as CSV files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on colissions with the circumstances (e.g. speed limit of road), information about vehicles involved (e.g. type of vehicle), and casualties (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the STATS19 collision reporting form. See the Department for Transport website for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) doi:10.21105/joss.01181. See Gilardi et al. (2022) doi:10.1111/rssa.12823, Vidal-Tortosa et al. (2021) doi:10.1016/j.jth.2021.101291, and Tait et al. (2023) doi:10.1016/j.aap.2022.106895 for examples of how the data can be used for methodological and empirical road safety research.
Read ODS (OpenDocument Spreadsheet) into R as data frame. Also support writing data frame into ODS file.
Download and process public domain works in the Project Gutenberg collection Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.
Tools for interacting with the Circle CI API ( Besides executing common tasks such as querying build logs and restarting builds, this package also helps setting up permissions to deploy from builds.
General purpose R client for ERDDAP™ servers. Includes functions to search for datasets, get summary information on datasets, and fetch datasets, in either csv or netCDF format. ERDDAP™ information:
Prism is a lightweight, extensible syntax highlighter, built with modern web standards in mind. This package provides server-side rendering in R using V8 such that no JavaScript library is required in the resulting HTML documents. Over 400 languages are supported.
Tools to get and maintain a data repository from third-party data providers.
A complete toolkit for processing the Munich ChronoType Questionnaire (MCTQ) in its three versions: standard, micro, and shift. The MCTQ is a quantitative and validated tool used to assess chronotypes based on individuals’ sleep behavior. It was originally presented by Till Roenneberg, Anna Wirz-Justice, and Martha Merrow in 2003 (2003, doi:10.1177/0748730402239679).
Extract and process bird sightings records from eBird (, an online tool for recording bird observations. Public access to the full eBird database is via the eBird Basic Dataset (EBD; see for access), a downloadable text file. This package is an interface to AWK for extracting data from the EBD based on taxonomic, spatial, or temporal filters, to produce a manageable file size that can be imported into R.
Bindings for the Tabula Java library, which can extract tables from PDF files. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection\ with a computer mouse for data retrieval.
A programmatic interface to FishBase, re-written based on an accompanying RESTful API. Access tables describing over 30,000 species of fish, their biology, ecology, morphology, and more. This package also supports experimental access to SeaLifeBase data, which contains nearly 200,000 species records for all types of aquatic life not covered by FishBase.
Provides functions to simplify the PatentsView API ( query language, send GET and POST requests to the API’s twenty seven endpoints, and parse the data that comes back.
Chemical information from around the web. This package interacts with a suite of web services for chemical information. Sources include: Alan Wood’s Compendium of Pesticide Common Names, Chemical Identifier Resolver, ChEBI, Chemical Translation Service, ChemSpider, ETOX, Flavornet, NIST Chemistry WebBook, OPSIN, PubChem, SRS, Wikidata.
Setup and connect to OpenTripPlanner (OTP) OTP is an open source platform for multi-modal and multi-agency journey planning written in Java. The package allows you to manage a local version or connect to remote OTP server to find walking, cycling, driving, or transit routes. This package has been peer-reviewed by rOpenSci (v.
Interface to the ZeroMQ lightweight messaging kernel (see for more information).
An extension for the xml2 package to transform XML documents by applying an xslt style-sheet.
Client for jq, a JSON processor (, written in C. jq allows the following with JSON data: index into, parse, do calculations, cut up and filter, change key names and values, perform conditionals and comparisons, and more.
Bindings to libsodium a modern, easy-to-use software library for encryption, decryption, signatures, password hashing and more. Sodium uses curve25519, a state-of-the-art Diffie-Hellman function by Daniel Bernstein, which has become very popular after it was discovered that the NSA had backdoored Dual EC DRBG.
A programmatic interface to the Web Service methods provided by Bold Systems ( for genetic barcode data. Functions include methods for searching by sequences by taxonomic names, ids, collectors, and institutions; as well as a function for searching for specimens, and downloading trace files.
Facilitates mapping by making natural earth map data from more easily available to R users.
Download geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from nine datasets: The National Elevation Dataset digital elevation models ( 1 and 1/3 arc-second; USGS); The National Hydrography Dataset (; USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (; NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; the Global Historical Climatology Network (; GHCN), coordinated by National Climatic Data Center at NOAA; the Daymet gridded estimates of daily weather parameters for North America, version 4, available from the Oak Ridge National Laboratory’s Distributed Active Archive Center (; DAAC); the International Tree Ring Data Bank; the National Land Cover Database (; NLCD); the Cropland Data Layer from the National Agricultural Statistics Service (; NASS); and the PAD-US dataset of protected area boundaries (; USGS).
Data that are collected through online sources such as Mechanical Turk may require excluding rows because of IP address duplication, geolocation, or completion duration. This package facilitates exclusion of these data for Qualtrics datasets.
There are many different formats dates are commonly represented with: the order of day, month, or year can differ, different separators ("-", “/”, or whitespace) can be used, months can be numerical, names, or abbreviations and year given as two digits or four. datefixR takes dates in all these different formats and converts them to Rs built-in date class. If datefixR cannot standardize a date, such as because it is too malformed, then the user is told which date cannot be standardized and the corresponding ID for the row. datefixR’ also allows the imputation of missing days and months with user-controlled behavior.
Provides a client for (1) querying the DHS API for survey indicators and metadata (, (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website and the online manual
Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the jagstargets R package is leverages targets and R2jags to ease this burden. jagstargets makes it super easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than targets alone. For the underlying methodology, please refer to the documentation of targets doi:10.21105/joss.02959 and JAGS (Plummer 2003)
Search the Internet Archive (, retrieve metadata, and download files.
Helps store files as GitHub release assets, which is a convenient way for large/binary data files to piggyback onto public and private GitHub repositories. Includes functions for file downloads, uploads, and managing releases via the GitHub API.
Queries the Flick API ( to return photograph metadata as well as the ability to download the images as jpegs.
Provides tools for importing and working with bibliographic references. It greatly enhances the bibentry class by providing a class BibEntry which stores BibTeX and BibLaTeX references, supports UTF-8 encoding, and can be easily searched by any field, by date ranges, and by various formats for name lists (author by last names, translator by full names, etc.). Entries can be updated, combined, sorted, printed in a number of styles, and exported. BibTeX and BibLaTeX .bib files can be read into R and converted to BibEntry objects. Interfaces to NCBI Entrez, CrossRef, and Zotero are provided for importing references and references can be created from locally stored PDF files using Poppler. Includes functions for citing and generating a bibliography with hyperlinks for documents prepared with RMarkdown or RHTML.
Give advice about good practices when building R packages. Advice includes functions and syntax to avoid, package structure, code complexity, code formatting, etc.
Interface to OpenStreetMap API for fetching and saving data from/to the OpenStreetMap database (
Provides functions to download and parse robots.txt files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, …) are allowed to access specific resources on a domain.
Interface with and extract data from the United Nations Comtrade API Comtrade provides country level shipping data for a variety of commodities, these functions allow for easy API query and data returned as a tidy data frame.
Simplified JSON document database access and manipulation, providing a common API across supported NoSQL databases Elasticsearch, CouchDB, MongoDB as well as SQLite/JSON1, PostgreSQL, and DuckDB.
Access and tidy up data from the ODK Central API. ODK Central is a clearinghouse for digitally captured data using ODK It manages user accounts and permissions, stores form definitions, and allows data collection clients like ODK Collect to connect to it for form download and submission upload. The ODK Central API is documented at
R bindings to rlite. rlite is a “self-contained, serverless, zero-configuration, transactional redis-compatible database engine. rlite is to Redis what SQLite is to SQL.”.
Allows users to fit a cosinor model using the glmmTMB framework. This extends on existing cosinor modeling packages, including cosinor and circacompare, by including a wide range of available link functions and the capability to fit mixed models. The cosinor model is described by Cornelissen (2014) doi:10.1186/1742-4682-11-16.
Simple git client for R based on libgit2 with support for SSH and HTTPS remotes. All functions in gert use basic R data types (such as vectors and data-frames) for their arguments and return values. User credentials are shared with command line git through the git-credential store and ssh keys stored on disk or ssh-agent.
Simple interface to query to fetch gitignore templates that can be included in the .gitignore file. More than 450 templates are currently available.
A polyhedra database scraped from various sources as R6 objects and rgl visualizing capabilities.
treeio is an R package to make it easier to import and store phylogenetic tree with associated data; and to link external data from different sources to phylogeny. It also supports exporting phylogenetic tree with heterogeneous associated data to a single tree file and can be served as a platform for merging tree with associated data and converting file formats.
Download and explore datasets from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
Provides R with the Glottolog database and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.
Facilitates the gathering of biodiversity occurrence data from disparate sources. Metadata is managed throughout the process to facilitate reporting and enhanced ability to repeat analyses.
Extract Text from Rich Text Format (RTF) Documents
View DocumentationRead, Tidy, and Display Data from Microtiter Plates
View DocumentationAutomatic Package Testing
Facilitates the automatic detection of acoustic signals, providing functions to diagnose and optimize the performance of detection routines. Detections from other software can also be explored and optimized. This package has been peer-reviewed by rOpenSci. Araya-Salas et al. (2022) doi:10.1101/2022.12.13.520253.
Time series toolkit with identical behavior for all time series classes: ts,xts, data.frame, data.table, tibble, zoo, timeSeries, tsibble, tis or irts. Also converts reliably between these classes.
Retrieve, map and summarize data from the archives ( Functions allow searching by many parameters, including taxonomic names, places, and dates. In addition, there is an interface for conducting spatially delimited searches, and another for requesting large datasets via email.
Tidy tools for NetCDF data sources. Explore the contents of a NetCDF source (file or URL) presented as variables organized by grid with a database-like interface. The hyper_filter() interactive function translates the filter value or index expressions to array-slicing form. No data is read until explicitly requested, as a data frame or list of arrays via hyper_tibble() or hyper_array().
Facilitates searching, download and plotting of Water Framework Directive (WFD) reporting data for all waterbodies within the UK Environment Agency area. The types of data that can be downloaded are: WFD status classification data, Reasons for Not Achieving Good (RNAG) status, objectives set for waterbodies, measures put in place to improve water quality and details of associated protected areas. The site accessed is The data are made available under the Open Government Licence v3.0
The CommonMark specification defines a rationalized version of markdown syntax. This package uses the cmark reference implementation for converting markdown text into various formats including html, latex and groff man. In addition it exposes the markdown parse tree in xml format. Also includes opt-in support for GFM extensions including tables, autolinks, and strikethrough text.
Taxonomic information from Wikipedia, Wikicommons, Wikispecies, and Wikidata. Functions included for getting taxonomic information from each of the sources just listed, as well performing taxonomic search.
A set of convenience functions as well as geographical/political data about Nigeria, aimed at simplifying work with data and information that are specific to the country.
Functions and example data to support research into the slope (also known as longitudinal gradient or steepness) of linear geographic entities such as roads doi:10.1038/s41597-019-0147-x and rivers doi:10.1016/j.jhydrol.2018.06.066. The package was initially developed to calculate the steepness of street segments but can be used to calculate steepness of any linear feature that can be represented as LINESTRING geometries in the sf class system. The package takes two main types of input data for slope calculation: vector geographic objects representing linear features, and raster geographic objects with elevation values (which can be downloaded using functionality in the package) representing a continuous terrain surface. Where no raster object is provided the package attempts to download elevation data using the ceramic package.
Provides functions to access historical and real-time national hydrometric data from Water Survey of Canada data sources ( and and then applies tidy data principles.
Motivated by changing administrative boundaries over time, the nuts package can convert European regional data with NUTS codes between versions (2006, 2010, 2013, 2016 and 2021) and levels (NUTS 1, NUTS 2 and NUTS 3). The package uses spatial interpolation as in Lam (1983) doi:10.1559/152304083783914958 based on granular (100m x 100m) area, population and land use data provided by the European Commission’s Joint Research Center.
Setup and retrieve HTTPS and SSH credentials for use with git and other services. For HTTPS remotes the package interfaces the git-credential utility which git uses to store HTTP usernames and passwords. For SSH remotes we provide convenient functions to find or generate appropriate SSH keys. The package both helps the user to setup a local git installation, and also provides a back-end for git/ssh client libraries to authenticate with existing user credentials.
Bindings to libfluidsynth to parse and synthesize MIDI files. It can read MIDI into a data frame, play it on the local audio device, or convert into an audio file.
Low level spell checker and morphological analyzer based on the famous hunspell library The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the spelling package which builds on this package to automate checking of files, documentation and vignettes in all common formats.
Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the qpdf C++ library and does not require any command line utilities. Note that qpdf does not read actual content from PDF files: to extract text and data you need the pdftools package.
Bindings to Googles C++ library Compact Language Detector 2 (see for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a cld3’ package on CRAN which uses a neural network model instead.
Googles Compact Language Detector 3 is a neural network model for language identification and the successor of cld2 (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from cld2’. See for more information.
Bindings to the libgraphqlparser C++ library. Parses GraphQL syntax and exports the AST in JSON format.
Exposes some of the available OpenCV algorithms, such as a QR code scanner, and edge, body or face detection. These can either be applied to analyze static images, or to filter live video footage from a camera device.
Spell checking common document formats including latex, markdown, manual pages, and description files. Includes utilities to automate checking of documentation and vignettes as a unit test during R CMD check. Both British and American English are supported out of the box and other languages can be added. In addition, packages may define a wordlist to allow custom terminology without having to abuse punctuation.
Wraps the AntiWord utility to extract text from Microsoft Word documents. The utility only supports the old doc format, not the new xml based docx format. Use the xml2 package to read the latter.
Bindings to Tesseract: a powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is highly configurable in order to tune the detection algorithms and obtain the best possible results.
JSON-LD is a light-weight syntax for expressing linked data. It is primarily intended for web-based programming environments, interoperable web services and for storing linked data in JSON-based databases. This package provides bindings to the JavaScript library for converting, expanding and compacting JSON-LD documents.
Generates simple and beautiful one-page HTML reference manuals with package documentation. Math rendering and syntax highlighting are done server-side in R such that no JavaScript libraries are needed in the browser, which makes the documentation portable and fast to load.
Convert latex math expressions to HTML and MathML for use in markdown documents or package manual pages. The rendering is done in R using the V8 engine (i.e. server-side), which eliminates the need for embedding the MathJax library into your web pages. In addition a math-to-rd wrapper is provided to automatically render beautiful math in R documentation files.
Read and write Frictionless Data Packages. A Data Package ( is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR ( and open datasets.
This web client interfaces Unpaywall, formerly oaDOI, a service finding free full-texts of academic papers by linking DOIs with open access journals and repositories. It provides unified access to various data sources for open access full-text links including Crossref and the Directory of Open Access Journals (DOAJ). API usage is free and no registration is required.
Utilities based on libpoppler for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.
Renders vector-based svg images into high-quality custom-size bitmap arrays using librsvg2. The resulting bitmap can be written to e.g. png, jpeg or webp format. In addition, the package can convert images directly to various formats such as pdf or postscript.
Connect to a remote server over SSH to transfer files via SCP, setup a secure tunnel, or run a command or script on the host while streaming stdout and stderr directly to the client.
Provee un acceso conveniente a mas de 17 millones de registros de la base de datos del Censo 2017. Los datos fueron importados desde el DVD oficial del INE usando el Convertidor REDATAM creado por Pablo De Grande. Esta paquete esta documentado intencionalmente en castellano asciificado para que funcione sin problema en diferentes plataformas. (Provides convenient access to more than 17 million records from the Chilean Census 2017 database. The datasets were imported from the official DVD provided by the Chilean National Bureau of Statistics by using the REDATAM converter created by Pablo De Grande and in addition it includes the maps accompanying these datasets.)
A framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual R CMD build process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled on GitHub, and used to share data for manuscripts, collaboration and reproducible research.
Handling taxonomic lists through objects of class taxlist. This package provides functions to import species lists from Turboveg ( and the possibility to create backups from resulting R-objects. Also quick displays are implemented as summary-methods.
A flexible tool that allows generating bespoke air transport statistics for urban studies based on publicly available data from the Bureau of Transport Statistics (BTS) in the United States
Provides functions supporting the reading and parsing of internal e-book content from EPUB files. The epubr package provides functions supporting the reading and parsing of internal e-book content from EPUB files. E-book metadata and text content are parsed separately and joined together in a tidy, nested tibble data frame. E-book formatting is not completely standardized across all literature. It can be challenging to curate parsed e-book content across an arbitrary collection of e-books perfectly and in completely general form, to yield a singular, consistently formatted output. Many EPUB files do not even contain all the same pieces of information in their respective metadata. EPUB file parsing functionality in this package is intended for relatively general application to arbitrary EPUB e-books. However, poorly formatted e-books or e-books with highly uncommon formatting may not work with this package. There may even be cases where an EPUB file has DRM or some other property that makes it impossible to read with epubr. Text is read as is for the most part. The only nominal changes are minor substitutions, for example curly quotes changed to straight quotes. Substantive changes are expected to be performed subsequently by the user as part of their text analysis. Additional text cleaning can be performed at the users discretion, such as with functions from packages like tm or qdap'.
Creates geographic map tiles from geospatial map files or non-geographic map tiles from simple image files. This package provides a tile generator function for creating map tile sets for use with packages such as leaflet. In addition to generating map tiles based on a common raster layer source, it also handles the non-geographic edge case, producing map tiles from arbitrary images. These map tiles, which have a non-geographic, simple coordinate reference system (CRS), can also be used with leaflet when applying the simple CRS option. Map tiles can be created from an input file with any of the following extensions: tif, grd and nc for spatial maps and png, jpg and bmp for basic images. This package requires Python and the gdal library for Python. Windows users are recommended to install OSGeo4W ( as an easy way to obtain the required gdal support for Python.
Provides functions to access survey results directly into R using the Qualtrics API. Qualtrics is an online survey and data collection software platform. See for more information about the Qualtrics API. This package is community-maintained and is not officially supported by Qualtrics.
Manage jobs and builds on your Jenkins CI server Create and edit projects, schedule builds, manage the queue, download build logs, and much more.
BEAST2 ( is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is commonly accompanied by BEAUti 2 (, which, among others, allows one to install BEAST2 package. This package allows to work with BEAST2 packages from R.
A universal client for depositing and accessing research data anywhere. Currently supported services are zenodo and figshare.
Setup, run and analyze NetLogo ( model simulations in R. nlrx experiments use a similar structure as NetLogos Behavior Space experiments. However, nlrx offers more flexibility and additional tools for running and analyzing complex simulation designs and sensitivity analyses. The user defines all information that is needed in an intuitive framework, using class objects. Experiments are submitted from R to NetLogo via XML files that are dynamically written, based on specifications defined by the user. By nesting model calls in future environments, large simulation design with many runs can be executed in parallel. This also enables simulating NetLogo experiments on remote high performance computing machines. In order to use this package, Java and NetLogo (>= 5.3.1) need to be available on the executing system.
Generate reports that enable quick visual review of temporal shifts in record-level data. Time series plots showing aggregated values are automatically created for each data field (column) depending on its contents (e.g. min/max/mean values for numeric data, no. of distinct values for categorical data), as well as overviews for missing values, non-conformant values, and duplicated rows. The resulting reports are shareable and can contribute to forming a transparent record of the entire analysis process. It is designed with Electronic Health Records in mind, but can be used for any type of record-level temporal data (i.e. tabular data where each row represents a single “event”, one column contains the “event date”, and other columns contain any associated values for the event).
Tools to discover hydrological data, accessing catalogues and databases from various data providers. The package is described in Vitolo (2017) “hddtools: Hydrological Data Discovery Tools” doi:10.21105/joss.00056.
This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Provides neutral landscape models (doi:10.1007/BF02275262, Neutral landscape models range from “hard” neutral models (completely random distributed), to “soft” neutral models (definable spatial characteristics) and generate landscape patterns that are independent of ecological processes. Thus, these patterns can be used as null models in landscape ecology. NLMR combines a large number of algorithms from other published software for simulating neutral landscapes. The simulation results are obtained in a spatial data format (raster* objects from the raster package) and can, therefore, be used in any sort of raster data operation that is performed with standard observation data.
Access UK official statistics from the Nomis database. Nomis includes data from the Census, the Labour Force Survey, DWP benefit statistics and other economic and demographic data from the Office for National Statistics, based around statistical geographies. See for full API documentation.
Interface to access data via the United States Department of Agricultures National Agricultural Statistical Service (NASS) Quick Stats’ web API Convenience functions facilitate building queries based on available parameters and valid parameter values. This product uses the NASS API but is not endorsed or certified by NASS.
Access Open Trade Statistics API from R to download international trade data.
Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the Propensity to Cycle Tool, a publicly available strategic cycle network planning tool (Lovelace et al. 2017) doi:10.5198/jtlu.2016.862, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) doi:10.1016/j.jtrangeo.2017.08.012 and routing with locally hosted routing engines such as OSRM (Lowans et al. 2023) doi:10.1016/j.enconman.2023.117337. The main functions are for creating and manipulating geographic “desire lines” from origin-destination (OD) data (building on the od package); calculating routes on the transport network locally and via interfaces to routing services such as (Desjardins et al. 2021) doi:10.1007/s11116-021-10197-1; and calculating route segment attributes such as bearing. The package implements the travel flow aggregration method described in Morgan and Lovelace (2020) doi:10.1177/2399808320942779 and the OD jittering method described in Lovelace et al. (2022) doi:10.32866/001c.33873. Further information on the package’s aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) doi:10.32614/RJ-2018-053, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) doi:10.1007/s10109-020-00342-2.
These sample data sets are intended for historians learning R. They include population, institutional, religious, military, and prosopographical data suitable for mapping, quantitative analysis, and network analysis.
Tools to parse and organize reference records downloaded from the Web of Science citation database into an R-friendly format, disambiguate the names of authors, geocode their locations, and generate/visualize coauthorship networks. This package has been peer-reviewed by rOpenSci (v. 1.0).
Make complex, interactive heatmaps. iheatmapr includes a modular system for iteratively building up complex heatmaps, as well as the iheatmap() function for making relatively standard heatmaps.
Parse a BibTeX file to a data.frame to make it accessible for further analysis and visualization.
Tools to download and manipulate the Permanent Household Survey from Argentina (EPH is the Spanish acronym for Permanent Household Survey). e.g: get_microdata() for downloading the datasets, get_poverty_lines() for downloading the official poverty baskets, calculate_poverty() for the calculation of stating if a household is in poverty or not, following the official methodology. organize_panels() is used to concatenate observations from different periods, and organize_labels() adds the official labels to the data. The implemented methods are based on INDEC (2016) As this package works with the argentinian Permanent Household Survey and its main audience is from this country, the documentation was written in Spanish.
The Resource Description Framework, or RDF is a widely used data representation model that forms the cornerstone of the Semantic Web. RDF represents data as a graph rather than the familiar data table or rectangle of relational databases. The rdflib package provides a friendly and concise user interface for performing common tasks on RDF data, such as reading, writing and converting between the various serializations of RDF data, including rdfxml, turtle, nquads, ntriples, and json-ld; creating new RDF graphs, and performing graph queries using SPARQL. This package wraps the low level redland R package which provides direct bindings to the redland C library. Additionally, the package supports the newer and more developer friendly JSON-LD format through the jsonld package. The package interface takes inspiration from the Python rdflib library.
BEAST2 ( is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. mcbette allows to do a Bayesian model comparison over some site and clock models, using babette (
There are a lot of different typical tasks that have to be solved during phonetic research and experiments. This includes creating a presentation that will contain all stimuli, renaming and concatenating multiple sound files recorded during a session, automatic annotation in Praat TextGrids (this is one of the sound annotation standards provided by Praat software, see Boersma & Weenink 2020, creating an html table with annotations and spectrograms, and converting multiple formats (Praat TextGrid, ELAN, EXMARaLDA, Audacity, subtitles .srt, and FLEx flextext). All of these tasks can be solved by a mixture of different tools (any programming language has programs for automatic renaming, and Praat contains scripts for concatenating and renaming files, etc.). phonfieldwork provides a functionality that will make it easier to solve those tasks independently of any additional tools. You can also compare the functionality with other packages: rPraat, textgRid
This is the R implementation of Karel the robot, a programming language created by Dr. R. E. Pattis at Stanford University in 1981. Karel is an useful tool to teach introductory concepts about general programming, such as algorithmic decomposition, conditional statements, loops, etc., in an interactive and fun way, by writing programs to make Karel the robot achieve certain tasks in the world she lives in. Originally based on Pascal, Karel was implemented in many languages through these decades, including Java, C++, Ruby and Python. This is the first package implementing Karel in R.
Provides an R interface to the Data Retriever via the Data Retriever’s command line interface. The Data Retriever automates the tasks of finding, downloading, and cleaning public datasets, and then stores them in a local database.
A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Rubys faraday gem ( The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package curl, an interface to libcurl’ (
Estimates when and where a model-guided treatment strategy may outperform a treat-all or treat-none approach by Monte Carlo simulation and evaluation of the Net Monetary Benefit. Details can be viewed in Parsons et al. (2023) doi:10.21105/joss.05328.
Provides API access to data from the U.S. Energy Information Administration (EIA) Use of the EIA’s API and this package requires a free API key obtainable at This package includes functions for searching the EIA data directory and returning time series and geoset time series datasets. Datasets returned by these functions are provided by default in a tidy format, or alternatively, in more raw formats. It also offers helper functions for working with EIA date strings and time formats and for inspecting different summaries of series metadata. The package also provides control over API key storage and caching of API request results.
Edit and validate taxonomic data in compliance with Darwin Core standards (Darwin Core Taxon class
A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from GenBank as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.
R interface to the Greek National Data Bank for Hydrological and Meteorological Information. It covers Hydroscope’s data sources and provides functions to transliterate, translate and download them into tidy dataframes.
Allows automating the creation of time series of rasters derived from MODIS satellite land products data. It performs several typical preprocessing steps such as download, mosaicking, reprojecting and resizing data acquired on a specified time period. All processing parameters can be set using a user-friendly GUI. Users can select which layers of the original MODIS HDF files they want to process, which additional quality indicators should be extracted from aggregated MODIS quality assurance layers and, in the case of surface reflectance products, which spectral indexes should be computed from the original reflectance bands. For each output layer, outputs are saved as single-band raster files corresponding to each available acquisition date. Virtual files allowing access to the entire time series as a single file are also created. Command-line execution exploiting a previously saved processing options file is also possible, allowing users to automatically update time series related to a MODIS product whenever a new image is available. For additional documentation refer to the following article: Busetto and Ranghetti (2016) doi:10.1016/j.cageo.2016.08.020.
Create preliminary exploratory data visualisations of an entire dataset to identify problems or unexpected features using ggplot2.
This tool is for parsing public drug databases such as DrugBank XML database The parsed data are then returned in a proper R object called dvobject.
In computationally demanding data analysis pipelines, the targets R package (2021, doi:10.21105/joss.02959) maintains an up-to-date set of results while skipping tasks that do not need to rerun. This process increases speed and increases trust in the final end product. However, it also overwrites old output with new output, and past results disappear by default. To preserve historical output, the gittargets package captures version-controlled snapshots of the data store, and each snapshot links to the underlying commit of the source code. That way, when the user rolls back the code to a previous branch or commit, gittargets can recover the data contemporaneous with that commit so that all targets remain up to date.
Call Google Cloud machine learning APIs for text and speech tasks. Call the Cloud Translation API for detection and translation of text, the Natural Language API to analyse text for sentiment, entities or syntax, the Cloud Speech API to transcribe sound files to text and the Cloud Text-to-Speech API to turn text into sound files.
Functions for the import, transformation, and analysis of data from muscle physiology experiments. The work loop technique is used to evaluate the mechanical work and power output of muscle. Josephson (1985) doi:10.1242/jeb.114.1.493 modernized the technique for application in comparative biomechanics. Although our initial motivation was to provide functions to analyze work loop experiment data, as we developed the package we incorporated the ability to analyze data from experiments that are often complementary to work loops. There are currently three supported experiment types: work loops, simple twitches, and tetanus trials. Data can be imported directly from .ddf files or via an object constructor function. Through either method, data can then be cleaned or transformed via methods typically used in studies of muscle physiology. Data can then be analyzed to determine the timing and magnitude of force development and relaxation (for isometric trials) or the magnitude of work, net power, and instantaneous power among other things (for work loops). Although we do not provide plotting functions, all resultant objects are designed to be friendly to visualization via either base-R plotting or tidyverse functions. This package has been peer-reviewed by rOpenSci (v. 1.1.0).
Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.
Use morphological image processing and edge detection algorithms to automatically measure tree ring widths on digital images. Users can also manually mark tree rings on species with complex anatomical structures. The arcs of inner-rings and angles of successive inclined ring boundaries are used to correct ring-width series. The package provides a Shiny-based application, allowing R beginners to easily analyze tree ring images and export ring-width series in standard file formats.
Functions for the retrieval, manipulation, and visualization of geospatial data, with an aim towards producing 3D landscape visualizations in the Unity 3D rendering engine. Functions are also provided for retrieving elevation data and base map tiles from the USGS National Map
The phruta R package is designed to simplify the basic phylogenetic pipeline. Specifically, all code is run within the same program and data from intermediate steps are saved in independent folders. Furthermore, all code is run within the same environment which increases the reproducibility of your analysis. phruta retrieves gene sequences, combines newly downloaded and local gene sequences, and performs sequence alignments.
An interface for interacting with OSF ( osfr enables you to access open research materials and data, or create and manage your own private or public projects.
Facilitates download of financial data from Yahoo Finance, a vast repository of stock price data across multiple financial exchanges. The package offers a local caching system and support for parallel computation.
BEAST2 ( is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is commonly accompanied by BEAUti 2, Tracer and DensiTree. babette provides for an alternative workflow of using all these tools separately. This allows doing complex Bayesian phylogenetics easily and reproducibly from R.
Estimate, report and combine felling dates of historical tree-ring series
View DocumentationMultiple Empirical Likelihood Tests
View DocumentationSemantically Rich I/O for the NeXML Format
View DocumentationQuantitative PCR Analysis with the Tidyverse
View DocumentationInterface to Virtuoso using ODBC
View DocumentationDealing with Multiplatform Satellite Images
View DocumentationA High-Performance Local Taxonomic Database Interface
View DocumentationAssertive Programming for R Analysis Pipelines
View DocumentationA Test Environment for Database Requests
View DocumentationInterface to Chromosome Counts Database API
View DocumentationFast, Consistent Tokenization of Natural Language Text
View DocumentationawardFindR
View DocumentationMarket Structure, Concentration and Inequality Measures
View DocumentationAutomated Cleaning of Occurrence Records from Biological Collections
View DocumentationInterface to the arXiv API
View DocumentationDownload and Process Data from the Paleobiology Database
View DocumentationFingertips Data for Public Health
View DocumentationClasses for Storing and Manipulating Taxonomic Data
View DocumentationA DoOR to the Complete Olfactome
View DocumentationDiscovery, Access and Manipulation of TreeBASE Phylogenies
View DocumentationWorld Vector Map Data from Natural Earth Used in rnaturalearth
View DocumentationDownload and Aggregate Data from Public Hire Bicycle Systems
View DocumentationA Unifying API for Calling the Unity 3D Video Game Engine
View DocumentationLabel Creation for Tracking and Collecting Data from Biological Samples
View DocumentationArchive and Unarchive Databases Using Flat Files
View DocumentationWorld Register of Marine Species (WoRMS) Client
View DocumentationMangal Client
View DocumentationAn Interface for the eLTER Community
View DocumentationParse Scientific Names
View DocumentationEstimate Avian Body Size Distributions
View DocumentationText Interchange Format
View DocumentationBibtex Parser
An R Client for the Europe PubMed Central RESTful Web Service (see for more information). It gives access to both metadata on life science literature and open access full texts. Europe PMC indexes all PubMed content and other literature sources including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents. In addition to bibliographic metadata, the client allows users to fetch citations and reference lists. Links between life-science literature and other EBI databases, including ENA, PDB or ChEMBL are also accessible. No registration or API key is required. See the vignettes for usage examples.
An interface to the fast_matrix_market C++ library, this package offers efficient read and write operations for Matrix Market files in R. It supports both sparse and dense matrix formats. Peer-reviewed at ROpenSci (
Convert data to GeoJSON or TopoJSON from various R classes, including vectors, lists, data frames, shape files, and spatial classes. geojsonio does not aim to replace packages like sp, rgdal, rgeos, but rather aims to be a high level client to simplify conversions of data from and to GeoJSON and TopoJSON.
A high performance interface to the Global Biodiversity Information Facility, GBIF. In contrast to rgbif, which can access small subsets of GBIF data through web-based queries to a central server, gbifdb provides enhanced performance for R users performing large-scale analyses on servers and cloud computing providers, providing full support for arbitrary SQL or dplyr operations on the complete GBIF data tables (now over 1 billion records, and over a terabyte in size). gbifdb accesses a copy of the GBIF data in parquet format, which is already readily available in commercial computing clouds such as the Amazon Open Data portal and the Microsoft Planetary Computer, or can be accessed directly without downloading, or downloaded to any server with suitable bandwidth and storage space. The high-performance techniques for local and remote access are described in and respectively.
Downloads data supplementary materials from manuscripts, using papers DOIs as references. Facilitates open, reproducible research workflows: scientists re-analyzing published datasets can work with them as easily as if they were stored on their own computer, and others can track their analysis workflow painlessly. The main function suppdata() returns a (temporary) location on the users computer where the file is stored, making it simple to use suppdata() with standard functions like read.csv().
There are a number of binary files associated with the Webdriver/Selenium project. This package provides functions to download these binaries and to manage processes involving them.
A client for the Environmental Data Initiative repository REST API. The EDI data repository is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the PASTA+ software stack and was developed in collaboration with the US LTER Network EDIutils includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
Get archived data of past and current hurricanes and tropical storms for the Atlantic and eastern Pacific oceans. Data is available for storms since 1998. Datasets are updated via the rrricanesdata package. Currently, this package is about 6MB of datasets. See the README or view vignette("drat")
World Magnetic Model
View DocumentationTracer from R
View DocumentationAvoid the Typical Working Directory Pain When Using knitr
View DocumentationClasses for GeoJSON
View DocumentationA Binary Download Manager
View DocumentationR Bindings for Selenium WebDriver
View DocumentationGet SNP (Single-Nucleotide Polymorphism) Data on the Web
View DocumentationAccess the Global Plant Phenology Data Portal
View DocumentationInterface to the Open Tree of Life API
View DocumentationGet Texts from the Perseus Digital Library
View DocumentationMoving-Window and Direct Data Aggregation
View DocumentationAIMS Data Platform API Client
View DocumentationGeneral Purpose Interface to Elasticsearch
View DocumentationCategorical Analysis of Neo- And Paleo-Endemism
View DocumentationGenerate CRediT Author Statements
View DocumentationR Interface to Apache Tika
View DocumentationInterface to Phylocom
View DocumentationAPI Wrapper Around
View DocumentationRetrieve Data from the 1000 Plants Initiative (1KP)
View DocumentationClient for the Comprehensive Knowledge Archive Network (CKAN) API
View DocumentationClient for Various CrossRef APIs
View DocumentationDeterministic Categorization of Items Based on External Code Data
View DocumentationWrangle, Analyze, and Visualize Animal Movement Data
View DocumentationAcquisition and Processing of NASA Soil Moisture Active-Passive (SMAP) Data
View DocumentationR Interface to the Species+ Database
View DocumentationAccess for Dryad Web Services
View DocumentationGeneral Purpose GraphQL Client
View DocumentationClient for the DataCite API
View DocumentationAccess London Natural History Museum Host-Helminth Record Database
View DocumentationFetch Phylogenies from Many Sources
View DocumentationAccess to the Neotoma Paleoecological Database Through R
View DocumentationClient for the Pangaea Database
View DocumentationGeneral Purpose Oai-PMH Services Client
View DocumentationWork with GitHub Gists
View DocumentationLandscape Utility Toolbox
View DocumentationAccess NASA's Exoplanet Archive Data
View DocumentationAntarctic Geographic Place Names
View DocumentationEPA Data Helper for R
View DocumentationDownload Time Series Data from
View DocumentationAccess iNaturalist Data Through APIs
View DocumentationA Flexible Container to Transport and Manipulate Data and Associated Resources
View DocumentationRead and Write Ecological Metadata Language Files
View DocumentationConvert Among Citation Formats
View DocumentationA DoOR to the Complete Olfactome
View DocumentationDownloading Data from Symbiota2 Portals into R
View DocumentationDownload Data from the European Social Survey on the Fly
View DocumentationLightweight Qualitative Coding
View DocumentationTree Biomass Estimation at Extra-Tropical Forest Plots
View DocumentationHistorical and Contemporary Boundaries of the United States of America
View DocumentationDatasets for the USAboundaries package
View DocumentationManipulation of Matched Phylogenies and Data using data.table
View DocumentationCreate Lightweight Descriptions of Data
View DocumentationConvert Antipsychotic Doses to Chlorpromazine Equivalents
View DocumentationClassifies Image Pixels by Colour
View DocumentationInterface to USDA Databases
View DocumentationAAPOR Survey Outcome Rates
View DocumentationEntrez in R
View DocumentationEcological Metadata as Linked Data
View DocumentationR Interface to the Global Population Dynamics Database
View DocumentationObtain and Visualize Regulome-Gene Expression Correlations in Cancer
View DocumentationPositron Emission Tomography Time-Activity Curve Analysis
View DocumentationConduct Co-Localization Analysis of Fluorescence Microscopy Images
View DocumentationProgrammatic Interface to the API
View DocumentationDendrograms for Evolutionary Analysis
View DocumentationParse Full Text XML Documents from PubMed Central
View DocumentationData for Atlantic and east Pacific tropical cyclones since 1998
View DocumentationDownload and Read RAM Legacy Stock Assessment Database
View DocumentationClient for CAMS Radiation Service
View DocumentationGenerate Starting Trees For Combined Molecular, Morphological and Stratigraphic Data
View DocumentationPopler R Package
The web service at provides a number of spatial data queries, including administrative area hierarchies, city locations and some country postal code queries. A (free) username is required and rate limits exist.
