rOpenSci | Data Extraction

Data Extraction

Convert and Munge Data
Showing 10 of 12

Extract Tables from PDF Documents

Mauricio Vargas Sepulveda
Description

Bindings for the Tabula https://tabula.technology/ Java library, which can extract tables from PDF files. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection\ with a computer mouse for data retrieval.

View Documentation

Standardize Dates in Different Formats or with Missing Data

Nathan Constantine-Cooke
Description

There are many different formats dates are commonly represented with: the order of day, month, or year can differ, different separators ("-", “/”, or whitespace) can be used, months can be numerical, names, or abbreviations and year given as two digits or four. datefixR takes dates in all these different formats and converts them to Rs built-in date class. If datefixR cannot standardize a date, such as because it is too malformed, then the user is told which date cannot be standardized and the corresponding ID for the row. datefixR’ also allows the imputation of missing days and months with user-controlled behavior.

View Documentation

JSON for Linking Data

Jeroen Ooms
Description

JSON-LD is a light-weight syntax for expressing linked data. It is primarily intended for web-based programming environments, interoperable web services and for storing linked data in JSON-based databases. This package provides bindings to the JavaScript library for converting, expanding and compacting JSON-LD documents.

View Documentation

High Performance CommonMark and Github Markdown Rendering in R

Jeroen Ooms
Description

The CommonMark specification defines a rationalized version of markdown syntax. This package uses the cmark reference implementation for converting markdown text into various formats including html, latex and groff man. In addition it exposes the markdown parse tree in xml format. Also includes opt-in support for GFM extensions including tables, autolinks, and strikethrough text.

View Documentation

Export Data Frames to Excel xlsx Format

Jeroen Ooms
Description

Zero-dependency data frame to xlsx exporter based on libxlsxwriter https://libxlsxwriter.github.io. Fast and no Java or Excel required.

View Documentation
Scientific use cases
  1. Garmendia, A., Raigón, M. D., Marques, O., Ferriol, M., Royo, J., & Merle, H. (2018). Effects of nettle slurry (Urtica dioica L.) used as foliar fertilizer on potato (Solanum tuberosum L.) yield and plant growth. PeerJ, 6, e4729. https://doi.org/10.7717/peerj.4729
  2. Garmendia, A., Merle, H., Ruiz, P., & Ferriol, M. (2018). Distribution and ecological segregation on regional and microgeographic scales of the diploid Centaurea aspera L., the tetraploid C. seridis L., and their triploid hybrids (Compositae). PeerJ, 6, e5209. https://doi.org/10.7717/peerj.5209
  3. Garmendia, A., Beltrán, R., Zornoza, C., Breijo, F., Reig, J., Bayona, I., & Merle, H. (2019). Insect repellent and chemical agronomic treatments to reduce seed number in ‘Afourer’ mandarin - Effect on yield and fruit diameter. Scientia Horticulturae. 246, 437–447. https://doi.org/10.1016/j.scienta.2018.11.025
  4. Ktenioudaki, A., O’Donnell, C. P., & do Nascimento Nunes, M. C. (2019). Modelling the biochemical and sensory changes of strawberries during storage under diverse relative humidity conditions. Postharvest Biology and Technology, 154, 148–158. https://doi.org/10.1016/j.postharvbio.2019.04.023
  5. Ayodele Benjamin, E., Vincent, E., Claudius, A., Olatomiwa, L., & Dickson, E. (2019). Data-based investigation on the performance of an independent Gas turbine for electricity generation using real power measurements and other closely related parameters. Data in Brief, 104444. https://doi.org/10.1016/j.dib.2019.104444
  6. Ehlers, M., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. (2020). Natural variations in brain morphology do not account for inter-individual differences in defensive responding during fear acquisition training and extinction. https://psyarxiv.com/q2kyf/download?format=pdf
  7. Wiley, M., & Wiley, J. F. (2020). Data Input and Output. Beginning R 4, 33–46. https://doi.org/10.1007/978-1-4842-6053-1_3
  8. Yan, T., Wang, Q., Maodzeka, A., Wu, D., & Jiang, L. (2020). BnaSNPDB: An interactive web portal for the efficient retrieval and analysis of SNPs among 1,007 rapeseed accessions. Computational and Structural Biotechnology Journal, 18, 2766–2773. https://doi.org/10.1016/j.csbj.2020.09.031
  9. Munzert, S., & Ramirez-Ruiz, S. (2020, October 10). Meta-Analysis of the Effects of Voting Advice Applications. https://doi.org/10.31219/osf.io/utdn4

Tools for Spell Checking in R

Jeroen Ooms
Description

Spell checking common document formats including latex, markdown, manual pages, and description files. Includes utilities to automate checking of documentation and vignettes as a unit test during R CMD check. Both British and American English are supported out of the box and other languages can be added. In addition, packages may define a wordlist to allow custom terminology without having to abuse punctuation.

View Documentation
Scientific use cases
  1. Luc, A., Lê, S., & Philippe, M. (2019). Nudging consumers for relevant data using Free JAR profiling: an application to product development. Food Quality and Preference, 103751. https://doi.org/10.1016/j.foodqual.2019.103751

Extensible Style-Sheet Language Transformations

Jeroen Ooms
Description

An extension for the xml2 package to transform XML documents by applying an xslt style-sheet.

View Documentation

Client for jq, a JSON Processor

Jeroen Ooms
Description

Client for jq, a JSON processor (https://jqlang.github.io/jq/), written in C. jq allows the following with JSON data: index into, parse, do calculations, cut up and filter, change key names and values, perform conditionals and comparisons, and more.

View Documentation

Store and Retrieve Data.frames in a Git Repository

Thierry Onkelinx
Description

The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette(“plain_text”, package = “git2rdata”). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette(“version_control”, package = “git2rdata”). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette(“workflow”, package = “git2rdata”) gives a toy example. 4) vignette(“efficiency”, package = “git2rdata”) provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.

View Documentation

Quantifying (Animal) Sound Degradation

Marcelo Araya-Salas
Description

Intended to facilitate acoustic analysis of (animal) sound transmission experiments, which typically aim to quantify changes in signal structure when transmitted in a given habitat by broadcasting and re-recording animal sounds at increasing distances. The package offers a workflow with functions to prepare the data set for analysis as well as to calculate and visualize several degradation metrics, including blur ratio, signal-to-noise ratio, excess attenuation and envelope correlation among others (Dabelsteen et al 1993 doi:10.1121/1.406682).

View Documentation

Data Quality Reporting for Temporal Datasets

T. Phuong Quan
Description

Generate reports that enable quick visual review of temporal shifts in record-level data. Time series plots showing aggregated values are automatically created for each data field (column) depending on its contents (e.g. min/max/mean values for numeric data, no. of distinct values for categorical data), as well as overviews for missing values, non-conformant values, and duplicated rows. The resulting reports are shareable and can contribute to forming a transparent record of the entire analysis process. It is designed with Electronic Health Records in mind, but can be used for any type of record-level temporal data (i.e. tabular data where each row represents a single “event”, one column contains the “event date”, and other columns contain any associated values for the event).

View Documentation

Base Classes and Functions for Phylogenetic Tree Input and Output

Guangchuang Yu
Description

treeio is an R package to make it easier to import and store phylogenetic tree with associated data; and to link external data from different sources to phylogeny. It also supports exporting phylogenetic tree with heterogeneous associated data to a single tree file and can be served as a platform for merging tree with associated data and converting file formats.

View Documentation
Scientific use cases
  1. Yu, G., Tsan-Yuk Lam, T., Zhu, H., & Guan, Y. (2018). Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. https://doi.org/10.1093/molbev/msy194
  2. Paudyal, N., Pan, H., Elbediwi, M., Zhou, X., Peng, X., Li, X., … Yue, M. (2019). Characterization of Salmonella Dublin isolated from bovine and human hosts. BMC Microbiology, 19(1). https://doi.org/10.1186/s12866-019-1598-0
  3. Callanan, J., Stockdale, S. R., Shkoporov, A., Draper, L. A., Ross, R. P., & Hill, C. (2020). Expansion of known ssRNA phage genomes: From tens to over a thousand. Science Advances, 6(6), eaay5981. https://doi.org/10.1126/sciadv.aay5981
  4. Ahrenfeldt, J., Waisi, M., Loft, I. C., Clausen, P. T. L. C., Allesøe, R., Szarvas, J., … Lund, O. (2020). Metaphylogenetic analysis of global sewage reveals that bacterial strains associated with human disease show less degree of geographic clustering. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-59292-w
  5. Ryt-Hansen, P., Pedersen, A. G., Larsen, I., Kristensen, C. S., Krog, J. S., Wacheck, S., & Larsen, L. E. (2020). Substantial Antigenic Drift in the Hemagglutinin Protein of Swine Influenza A Viruses. Viruses, 12(2), 248. https://doi.org/10.3390/v12020248
  6. Yu, G. (2020). Using ggtree to Visualize Data on Tree‐Like Structures. Current Protocols in Bioinformatics, 69(1). https://doi.org/10.1002/cpbi.96
  7. Lequime, S., Bastide, P., Dellicour, S., Lemey, P., & Baele, G. (2020). nosoi: a stochastic agent-based transmission chain simulation framework in R. https://doi.org/10.1101/2020.03.03.973107
  8. Bastide, P., Ho, L. S. T., Baele, G., Lemey, P., & Suchard, M. A. (2020). Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees. arXiv preprint arXiv:2003.10336. https://arxiv.org/pdf/2003.10336
  9. Ordynets, A., Liebisch, R., Lysenko, L., Scherf, D., Volobuev, S., Saitta, A., … Langer, E. (2020). Morphologically similar but not closely related: the long-spored species of Subulicystidium (Trechisporales, Basidiomycota). Mycological Progress, 19(7), 691–703. https://doi.org/10.1007/s11557-020-01587-3
  10. Carroll, L. M., Huisman, J. S., & Wiedmann, M. (2020). Twentieth-century emergence of antimicrobial resistant human- and bovine-associated Salmonella enterica serotype Typhimurium lineages in New York State. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-71344-9
  11. Whitmer, S. L. M., Lo, M. K., Sazzad, H. M. S., Zufan, S., Gurley, E. S., Sultana, S., … Klena, J. D. (2020). Inference of Nipah virus Evolution, 1999-2015. Virus Evolution. https://doi.org/10.1093/ve/veaa062
  12. Ettinger, C. L., & Eisen, J. A. (2020). Fungi, bacteria and oomycota opportunistically isolated from the seagrass, Zostera marina. PLOS ONE, 15(7), e0236135. https://doi.org/10.1371/journal.pone.0236135
  13. Huang, R., Soneson, C., Ernst, F. G. M., Rue-Albrecht, K. C., Yu, G., Hicks, S. C., & Robinson, M. D. (2020). TreeSummarizedExperiment: a S4 class for data with hierarchical structure. F1000Research, 9, 1246. https://doi.org/10.12688/f1000research.26669.1
  14. Figueroa, H., & Smith, S. A. (2020). A targeted phylogenetic approach helps explain New World functional diversity patterns of two eudicot lineages. Journal of Biogeography. https://doi.org/10.1111/jbi.13993
  15. Alvarado-Ortega, J., & Díaz-Cruz, J. A. (2021). Hastichthys totonacus sp. nov., a North American Turonian dercetid fish (Teleostei, Aulopiformes) from the Huehuetla quarry, Puebla, Mexico. Journal of South American Earth Sciences, 105, 102900. https://doi.org/10.1016/j.jsames.2020.102900
  16. Chak, S. T. C., Baeza, J. A., & Barden, P. (2020). Eusociality Shapes Convergent Patterns of Molecular Evolution across Mitochondrial Genomes of Snapping Shrimps. Molecular Biology and Evolution. https://doi.org/10.1093/molbev/msaa297
  17. Wagner, E., Zaiser, A., Leitner, R., Quijada, N. M., Pracser, N., Pietzka, A., … Rychli, K. (2020). Virulence characterization and comparative genomics of Listeria monocytogenes sequence type 155 strains. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-020-07263-w
  18. Toparslan, E., Karabag, K., & Bilge, U. (2020). A workflow with R: Phylogenetic analyses and visualizations using mitochondrial cytochrome b gene sequences. PLOS ONE, 15(12), e0243927. https://doi.org/10.1371/journal.pone.0243927
  19. Oswald, K. N., Lee, A. T. K., & Smit, B. (2021). Seasonal metabolic adjustments in an avian evolutionary relict restricted to mountain habitat. Journal of Thermal Biology, 95, 102815. https://doi.org/10.1016/j.jtherbio.2020.102815
  20. Maruyama, H., Masago, A., Nambu, T., Mashimo, C., Takahashi, K., & Okinaga, T. (2020). Inter-site and interpersonal diversity of salivary and tongue microbiomes, and the effect of oral care tablets. F1000Research, 9, 1477. https://doi.org/10.12688/f1000research.27502.1
  21. Gates, M. W., Zhang, Y. M., & Buffington, M. L. (2020). The great greenbriers gall mystery resolved? New species of Aprostocetus Westwood (Hymenoptera, Eulophidae) gall inducer and two new parasitoids (Hymenoptera, Eurytomidae) associated with Smilax L. in southern Florida, USA. Journal of Hymenoptera Research, 80, 71–98. https://doi.org/10.3897/jhr.80.59466
  22. Sellés Vidal, L., Ayala, R., Stan, G.-B., & Ledesma-Amaro, R. (2021). rfaRm: An R client-side interface to facilitate the analysis of the Rfam database of RNA families. PLOS ONE, 16(1), e0245280. doi:10.1371/journal.pone.0245280
  23. Vozdova, M., Kubickova, S., Martínková, N., Galindo, D. J., Bernegossi, A. M., Cernohorska, H., … Rubes, J. (2021). Satellite DNA in Neotropical Deer Species. Genes, 12(1), 123. doi:10.3390/genes12010123

Munich ChronoType Questionnaire Tools

Daniel Vartanian
Description

A complete toolkit for processing the Munich ChronoType Questionnaire (MCTQ) in its three versions: standard, micro, and shift. The MCTQ is a quantitative and validated tool used to assess chronotypes based on individuals’ sleep behavior. It was originally presented by Till Roenneberg, Anna Wirz-Justice, and Martha Merrow in 2003 (2003, doi:10.1177/0748730402239679).

View Documentation

Parse a BibTeX File to a Data Frame

Gianluca Baio
Description

Parse a BibTeX file to a data.frame to make it accessible for further analysis and visualization.

View Documentation
Scientific use cases
  1. Scharmüller, A., Schreiner, V. C., & Schäfer, R. B. (2020). Standartox: Standardizing Toxicity Data. Data, 5(2), 46. https://doi.org/10.3390/data5020046
  2. LeBeau, B. C., & Aloe, A. M. (2020). Evolution of Statistical Software and Quantitative Methods. https://doi.org/10.17077/pp.005273
  3. Benjamens, S., Banning, L. B., van den Berg, T. A., & Pol, R. A. (2020). Gender Disparities in Authorships and Citations in Transplantation Research. Transplantation direct, 6(11). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7575186/

Tools to Manipulate and Query Semantic Data

Carl Boettiger
Description

The Resource Description Framework, or RDF is a widely used data representation model that forms the cornerstone of the Semantic Web. RDF represents data as a graph rather than the familiar data table or rectangle of relational databases. The rdflib package provides a friendly and concise user interface for performing common tasks on RDF data, such as reading, writing and converting between the various serializations of RDF data, including rdfxml, turtle, nquads, ntriples, and json-ld; creating new RDF graphs, and performing graph queries using SPARQL. This package wraps the low level redland R package which provides direct bindings to the redland C library. Additionally, the package supports the newer and more developer friendly JSON-LD format through the jsonld package. The package interface takes inspiration from the Python rdflib library.

View Documentation
Scientific use cases
  1. Panayiotou, C. (2020). An Ontological Analysis and Natural Language Processing of Figures of Speech. International Journal of Artificial Intelligence & Applications, 11(1), 17–30. https://doi.org/10.5121/ijaia.2020.11102
phonfieldwork
CRAN Peer-reviewed

Linguistic Phonetic Fieldwork Tools

George Moroz
Description

There are a lot of different typical tasks that have to be solved during phonetic research and experiments. This includes creating a presentation that will contain all stimuli, renaming and concatenating multiple sound files recorded during a session, automatic annotation in Praat TextGrids (this is one of the sound annotation standards provided by Praat software, see Boersma & Weenink 2020 https://www.fon.hum.uva.nl/praat/), creating an html table with annotations and spectrograms, and converting multiple formats (Praat TextGrid, ELAN, EXMARaLDA, Audacity, subtitles .srt, and FLEx flextext). All of these tasks can be solved by a mixture of different tools (any programming language has programs for automatic renaming, and Praat contains scripts for concatenating and renaming files, etc.). phonfieldwork provides a functionality that will make it easier to solve those tasks independently of any additional tools. You can also compare the functionality with other packages: rPraat https://CRAN.R-project.org/package=rPraat, textgRid https://CRAN.R-project.org/package=textgRid.

View Documentation

Optimizing Acoustic Signal Detection

Marcelo Araya-Salas
Description

Facilitates the automatic detection of acoustic signals, providing functions to diagnose and optimize the performance of detection routines. Detections from other software can also be explored and optimized. This package has been peer-reviewed by rOpenSci. Araya-Salas et al. (2022) doi:10.1101/2022.12.13.520253.

View Documentation

Convert European Regional Data

Moritz Hennicke
Description

Motivated by changing administrative boundaries over time, the nuts package can convert European regional data with NUTS codes between versions (2006, 2010, 2013, 2016 and 2021) and levels (NUTS 1, NUTS 2 and NUTS 3). The package uses spatial interpolation as in Lam (1983) doi:10.1559/152304083783914958 based on granular (100m x 100m) area, population and land use data provided by the European Commission’s Joint Research Center.

View Documentation

Read and Write ODS Files

Chung-hong Chan
Description

Read ODS (OpenDocument Spreadsheet) into R as data frame. Also support writing data frame into ODS file.

View Documentation

Analysis of Work Loops and Other Data from Muscle Physiology Experiments

Vikram B. Baliga
Description

Functions for the import, transformation, and analysis of data from muscle physiology experiments. The work loop technique is used to evaluate the mechanical work and power output of muscle. Josephson (1985) doi:10.1242/jeb.114.1.493 modernized the technique for application in comparative biomechanics. Although our initial motivation was to provide functions to analyze work loop experiment data, as we developed the package we incorporated the ability to analyze data from experiments that are often complementary to work loops. There are currently three supported experiment types: work loops, simple twitches, and tetanus trials. Data can be imported directly from .ddf files or via an object constructor function. Through either method, data can then be cleaned or transformed via methods typically used in studies of muscle physiology. Data can then be analyzed to determine the timing and magnitude of force development and relaxation (for isometric trials) or the magnitude of work, net power, and instantaneous power among other things (for work loops). Although we do not provide plotting functions, all resultant objects are designed to be friendly to visualization via either base-R plotting or tidyverse functions. This package has been peer-reviewed by rOpenSci (v. 1.1.0).

View Documentation

Manage Data from Cardiopulmonary Exercise Testing

Simon Nolte
Description

Import, process, summarize and visualize raw data from metabolic carts. See Robergs, Dwyer, and Astorino (2010) doi:10.2165/11319670-000000000-00000 for more details on data processing.

View Documentation

Read Spectrometric Data and Metadata

Hugo Gruson
Description

Parse various reflectance/transmittance/absorbance spectra file formats to extract spectral data and metadata, as described in Gruson, White & Maia (2019) doi:10.21105/joss.01857. Among other formats, it can import files from Avantes https://www.avantes.com/, CRAIC https://www.microspectra.com/, and OceanInsight (formerly OceanOptics) https://www.oceaninsight.com/ brands.

View Documentation

Phylogenetic Reconstruction and Time-dating

Cristian Roman Palacios
Description

The phruta R package is designed to simplify the basic phylogenetic pipeline. Specifically, all code is run within the same program and data from intermediate steps are saved in independent folders. Furthermore, all code is run within the same environment which increases the reproducibility of your analysis. phruta retrieves gene sequences, combines newly downloaded and local gene sequences, and performs sequence alignments.

View Documentation

Quantitative PCR Analysis with the Tidyverse

Edward Wallace
Description

For reproducible quantitative PCR (qPCR) analysis building on packages from the ’tidyverse’, notably ’dplyr’ and ’ggplot2’. It normalizes (by ddCq), summarizes, and plots pre-calculated Cq data, and plots raw amplification and melt curves from Roche Lightcycler (tm) machines. It does NOT (yet) calculate Cq data from amplification curves.

View Documentation

Create and Query a Local Copy of GenBank in R

Joel H. Nitta
Description

Download large sections of GenBank https://www.ncbi.nlm.nih.gov/genbank/ and generate a local SQL-based database. A user can then query this database using restez functions or through rentrez https://CRAN.R-project.org/package=rentrez wrappers.

View Documentation
Scientific use cases
  1. Bennett, D., Hettling, H., Silvestro, D., Vos, R., & Antonelli, A. (2018). restez: Create and Query a Local Copy of GenBank in R. Journal of Open Source Software, 3(31), 1102. https://doi.org/10.21105/joss.01102
  2. Ruiz-Sanchez, E., Maya-Lastra, C. A., Steinmann, V. W., Zamudio, S., Carranza, E., Murillo, R. M., & Rzedowski, J. (2019). Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study. Botanical Sciences, 97(4), 754–760. https://doi.org/10.17129/botsci.2226
CoordinateCleaner
CRAN Peer-reviewed

Automated Cleaning of Occurrence Records from Biological Collections

Alexander Zizka
Description

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) doi:10.1111/2041-210X.13152.

View Documentation
Scientific use cases
  1. Milla, R., Bastida, J. M., Turcotte, M. M., Jones, G., Violle, C., Osborne, C. P., … Byun, C. (2018). Phylogenetic patterns and phenotypic profiles of the species of plants and mammals farmed for food. Nature Ecology & Evolution, 2(11), 1808–1817. https://doi.org/10.1038/s41559-018-0690-4
  2. Zizka, A., Silvestro, D., Andermann, T., Azevedo, J., Duarte Ritter, C., Edler, D., … Antonelli, A. (2019). CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210x.13152
  3. Rice, A., Šmarda, P., Novosolov, M., Drori, M., Glick, L., Sabath, N., … Mayrose, I. (2019). The global biogeography of polyploid plants. Nature Ecology & Evolution, 3(2), 265–273. https://doi.org/10.1038/s41559-018-0787-9
  4. Karger, D. N., Kessler, M., Conrad, O., Weigelt, P., Kreft, H., König, C., & Zimmermann, N. E. (2019). Why tree lines are lower on islands-Climatic and biogeographic effects hold the answer. Global Ecology and Biogeography. https://doi.org/10.1111/geb.12897
  5. De Frenne, P., Zellweger, F., Rodríguez-Sánchez, F., Scheffers, B. R., Hylander, K., Luoto, M., … Lenoir, J. (2019). Global buffering of temperatures under forest canopies. Nature Ecology & Evolution. https://doi.org/10.1038/s41559-019-0842-1
  6. Colli‐Silva, M., Vasconcelos, T. N. C., & Pirani, J. R. (2019). Outstanding plant endemism levels strongly support the recognition of campo rupestre provinces in mountaintops of eastern South America. Journal of Biogeography. https://doi.org/10.1111/jbi.13585
  7. Waller, J. (2019). Data Location Quality at GBIF. Biodiversity Information Science and Standards, 3. https://doi.org/10.3897/biss.3.35829
  8. Butterfield, B. J., Holmgren, C. A., Anderson, R. S., & Betancourt, J. L. (2019). Life history traits predict colonization and extinction lags of desert plant species since the Last Glacial Maximum. Ecology. https://doi.org/10.1002/ecy.2817
  9. Wüest, R. O., Zimmermann, N. E., Zurell, D., Alexander, J. M., Fritz, S. A., Hof, C., … Karger, D. N. (2019). Macroecology in the age of Big Data – Where to go from here? Journal of Biogeography. https://doi.org/10.1111/jbi.13633
  10. Pender, J. E., Hipp, A. L., Hahn, M., Kartesz, J., Nishino, M., & Starr, J. R. (2019). How sensitive are climatic niche inferences to distribution data sampling? A comparison of Biota of North America Program (BONAP) and Global Biodiversity Information Facility (GBIF) datasets. Ecological Informatics, 100991. https://doi.org/10.1016/j.ecoinf.2019.100991
  11. Feng, X., Park, D. S., Walker, C., Peterson, A. T., Merow, C., & Papeş, M. (2019). A checklist for maximizing reproducibility of ecological niche models. Nature Ecology & Evolution. https://doi.org/10.1038/s41559-019-0972-5
  12. Espinosa, B. S., D’Apolito, C., Silva-Caminha, S. A. F., Ferreira, M. G., & Absy, M. L. (2020). Neogene paleoecology and biogeography of a Malvoid pollen in northwestern South America. Review of Palaeobotany and Palynology, 273, 104131. https://doi.org/10.1016/j.revpalbo.2019.104131
  13. Jin, J., & Yang, J. (2020). BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases. Global Ecology and Conservation, 21, e00852. https://doi.org/10.1016/j.gecco.2019.e00852
  14. Zizka, A., Azevedo, J., Leme, E., Neves, B., Costa, A. F., Caceres, D., & Zizka, G. (2019). Biogeography and conservation status of the pineapple family (Bromeliaceae). Diversity and Distributions, 26(2), 183–195. https://doi.org/10.1111/ddi.13004
  15. Marshall, B. M., & Strine, C. T. (2019). Exploring snake occurrence records: Spatial biases and marginal gains from accessible social media. PeerJ, 7, e8059. https://doi.org/10.7717/peerj.8059
  16. Asevedo, L., D’Apolito, C., Misumi, S. Y., Barros, M. A. de, Barth, O. M., & Avilla, L. dos S. (2020). Palynological analysis of dental calculus from Pleistocene proboscideans of southern Brazil: A new approach for paleodiet and paleoenvironmental reconstructions. Palaeogeography, Palaeoclimatology, Palaeoecology, 540, 109523. https://doi.org/10.1016/j.palaeo.2019.109523
  17. Léveillé-Bourret, É., Chen, B.-H., Garon-Labrecque, M.-È., Ford, B. A., & Starr, J. R. (2020). RAD sequencing resolves the phylogeny, taxonomy and biogeography of Trichophoreae despite a recent rapid radiation (Cyperaceae). Molecular Phylogenetics and Evolution, 145, 106727. https://doi.org/10.1016/j.ympev.2019.106727
  18. Moudrý, V., & Devillers, R. (2020). Quality and usability challenges of global marine biodiversity databases: An example for marine mammal data. Ecological Informatics, 56, 101051. https://doi.org/10.1016/j.ecoinf.2020.101051
  19. Alfaro-Ramírez, F. U., Ramírez-Albores, J. E., Vargas-Hernández, J. J., Franco-Maass, S., & Pérez-Suárez, M. (2020). Potential reduction of Hartweg´s Pine (Pinus hartwegii Lindl.) geographic distribution. PLOS ONE, 15(2), e0229178. https://doi.org/10.1371/journal.pone.0229178
  20. Armitage, D. W., & Jones, S. E. (2020). Barriers to coexistence limit the poleward range of a globally-distributed plant. https://doi.org/10.1101/2020.02.24.946574
  21. Zizka, A., Carvalho‐Sobrinho, J. G., Pennington, R. T., Queiroz, L. P., Alcantara, S., Baum, D. A., … Antonelli, A. (2020). Transitions between biomes are common and directional in Bombacoideae (Malvaceae). Journal of Biogeography. https://doi.org/10.1111/jbi.13815
  22. Bernardi, A. P., Lauterjung, M. B., Mantovani, A., & dos Reis, M. S. (2020). Phylogeography and species distribution modeling reveal a historic disjunction for the conifer Podocarpus lambertii. Tree Genetics & Genomes, 16(3). https://doi.org/10.1007/s11295-020-01434-2
  23. Gaynor, M. L., Fu, C., Gao, L., Lu, L., Soltis, D. E., & Soltis, P. S. (2020). Biogeography and ecological niche evolution in Diapensiaceae inferred from phylogenetic analysis. Journal of Systematics and Evolution. https://doi.org/10.1111/jse.12646
  24. Pacifico, R., Almeda, F., Frota, A., & Fidanza, K. (2020). Areas of endemism on Brazilian mountaintops revealed by taxonomically verified records of Microlicieae (Melastomataceae). Phytotaxa, 450(2), 119–148. https://doi.org/10.11646/phytotaxa.450.2.1
  25. Waldock, C. A., De Palma, A., Borges, P. A. V., & Purvis, A. (2020). Insect occurrence in agricultural land‐uses depends on realized niche and geographic range properties. Ecography. https://doi.org/10.1111/ecog.05162
  26. Colli‐Silva, M., Reginato, M., Cabral, A., Forzza, R. C., Pirani, J. R., & Vasconcelos, T. N. da C. (2020). Evaluating shortfalls and spatial accuracy of biodiversity documentation in the Atlantic Forest, the most diverse and threatened Brazilian phytogeographic domain. TAXON, 69(3), 567–577. https://doi.org/10.1002/tax.12239
  27. Sanchez‐Martinez, P., Martínez‐Vilalta, J., Dexter, K. G., Segovia, R. A., & Mencuccini, M. (2020). Adaptation and coordinated evolution of plant hydraulic traits. Ecology Letters. https://doi.org/10.1111/ele.13584
  28. Reimuth, J., & Zotz, G. (2020). The biogeography of the megadiverse genus Anthurium (Araceae). Botanical Journal of the Linnean Society, 194(2), 164-176. https://doi.org/10.1093/botlinnean/boaa044
  29. Polaina, E., Pärt, T., & Recio, M. R. (2020). Identifying hotspots of invasive alien terrestrial vertebrates in Europe to assist transboundary prevention and control. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-68387-3
  30. Mothes, C. C., Howell, H. J., & Searcy, C. A. (2020). Habitat suitability models for the imperiled wood turtle (Glyptemys insculpta) raise concerns for the species’ persistence under future climate change. Global Ecology and Conservation, 24, e01247. https://doi.org/10.1016/j.gecco.2020.e01247
  31. Nania, D., Flecks, M., & Rödder, D. (2020). Continuous expansion of the geographic range linked to realized niche expansion in the invasive Mourning gecko Lepidodactylus lugubris (Duméril & Bibron, 1836). PLOS ONE, 15(7), e0235060. https://doi.org/10.1371/journal.pone.0235060
  32. Brightly, W. H., Hartley, S. E., Osborne, C. P., Simpson, K. J., & Strömberg, C. A. E. (2020). High silicon concentrations in grasses are linked to environmental conditions and not associated with C4 photosynthesis. Global Change Biology. https://doi.org/10.1111/gcb.15343
  33. Paton, A., Antonelli, A., Carine, M., Forzza, R. C., Davies, N., Demissew, S., … Dickie, J. (2020). Plant and fungal collections: Current status, future perspectives. PLANTS, PEOPLE, PLANET, 2(5), 499–514. https://doi.org/10.1002/ppp3.10141
  34. Carrillo, J. D., Faurby, S., Silvestro, D., Zizka, A., Jaramillo, C., Bacon, C. D., & Antonelli, A. (2020). Disproportionate extinction of South American mammals drove the asymmetry of the Great American Biotic Interchange. Proceedings of the National Academy of Sciences, 117(42), 26281–26287. https://doi.org/10.1073/pnas.2009397117
  35. Figueroa, H., & Smith, S. A. (2020). A targeted phylogenetic approach helps explain New World functional diversity patterns of two eudicot lineages. Journal of Biogeography. https://doi.org/10.1111/jbi.13993
  36. Simpson, K. J., Jardine, E. C., Archibald, S., Forrestel, E. J., Lehmann, C. E. R., Thomas, G. H., & Osborne, C. P. (2020). Resprouting grasses are associated with less frequent fire than seeders. New Phytologist. https://doi.org/10.1111/nph.17069
  37. Bello, C., Cintra, A. L. P., Barreto, E., Vancine, M. H., Sobral-Souza, T., Graham, C. H., & Galetti, M. (2020). Environmental niche and functional role similarity between invasive and native palms in the Atlantic Forest. Biological Invasions. https://doi.org/10.1007/s10530-020-02400-8
  38. Roigé, M., & Phillips, C. B. (2021). Validation and uncertainty analysis of the match climates regional algorithm (CLIMEX) for Pest risk analysis. Ecological Informatics, 61, 101196. https://doi.org/10.1016/j.ecoinf.2020.101196
  39. Panter, C. T., Clegg, R. L., Moat, J., Bachman, S. P., Klitgård, B. B., & White, R. L. (2020). To clean or not to clean: Cleaning open‐source data improves extinction risk assessments for threatened plant species. Conservation Science and Practice, 2(12). https://doi.org/10.1111/csp2.311
  40. Chowdhury, S., Braby, M. F., Fuller, R. A., & Zalucki, M. P. (2020). Coasting along to a wider range: niche conservatism in the recent range expansion of the Tawny Coster, Acraea terpsicore (Lepidoptera: Nymphalidae). Diversity and Distributions. https://doi.org/10.1111/ddi.13200
  41. Farooq, H., Azevedo, J. A. R., Soares, A., Antonelli, A., & Faurby, S. (2020). Mapping Africa’s Biodiversity: More of the Same Is Just Not Good Enough. Systematic Biology. https://doi.org/10.1093/sysbio/syaa090
  42. Tamme, R., Pärtel, M., Kõljalg, U., Laanisto, L., Liira, J., Mander, Ü., … Zobel, M. (2020). Global macroecology of nitrogen‐fixing plants. Global Ecology and Biogeography, 30(2), 514–526. https://doi.org/10.1111/geb.13236
  43. Esquivel, D. A., Aya-Cuero, C., Penagos, A. P., Chacón-Pacheco, J., Agámez-López, C. J., Ochoa, A. V., … Bennett, D. (2020). Updating the distribution of Vampyrum spectrum (Chiroptera, Phyllostomidae) in Colombia: new localities, potential distribution and notes on its conservation. Neotropical Biology and Conservation, 15(4), 689–709. https://doi.org/10.3897/neotropical.15.e58383
  44. Suissa, J. S., & Sundue, M. A. (2020). Diversity Patterns of Neotropical Ferns: Revisiting Tryon’s Centers of Richness and Endemism. American Fern Journal, 110(4). https://doi.org/10.1640/0002-8444-110.4.211
  45. BELLO, A., MUKHTAR, F. B., & MUELLNER-RIEHL, A. N. (2021). DIVERSITY AND DISTRIBUTION OF NIGERIAN LEGUMES (FABACEAE). Phytotaxa, 480(2), 103–124. doi:10.11646/phytotaxa.480.2.1
  46. Delso, A., Muñoz, J., & Fajardo, J. (2021). Protected Area Networks Do Not Represent Unseen Diversity. doi:10.21203/rs.3.rs-145219/v1
  47. Escobar, S., Helmstetter, A. J., Jarvie, S., Montúfar, R., Balslev, H., & Couvreur, T. L. P. (2021). Pleistocene climatic fluctuations promoted alternative evolutionary histories in Phytelephas aequatorialis, an endemic palm from western Ecuador. Journal of Biogeography, 48(5), 1023–1037. doi:10.1111/jbi.14055
  48. Ryeland, J., Derham, T. T., & Spencer, R. J. (2021). Past and future potential range changes in one of the last large vertebrates of the Australian continent, the emu Dromaius novaehollandiae. Scientific Reports, 11(1). doi:10.1038/s41598-020-79551-0
gendercoder

Recodes Sex/Gender Descriptions into a Standard Set

Yaoxiang Li
Description

Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.

View Documentation

Checks for Exclusion Criteria in Online Data

Jeffrey R. Stevens
Description

Data that are collected through online sources such as Mechanical Turk may require excluding rows because of IP address duplication, geolocation, or completion duration. This package facilitates exclusion of these data for Qualtrics datasets.

View Documentation
fastMatMR
Peer-reviewed

High-Performance Matrix Market File Operations

Rohit Goswami
Description

An interface to the fast_matrix_market C++ library, this package offers efficient read and write operations for Matrix Market files in R. It supports both sparse and dense matrix formats. Peer-reviewed at ROpenSci (https://github.com/ropensci/software-review/issues/606).

View Documentation

World Magnetic Model

Will Frierson
Description

Calculate magnetic field at a given location and time according to the World Magnetic Model (WMM). Both the main field and secular variation components are returned. This functionality is useful for physicists and geophysicists who need orthogonal components from WMM. Currently, this package supports annualized time inputs between 2000 and 2025. If desired, users can specify which WMM version to use, e.g., the original WMM2015 release or the recent out-of-cycle WMM2015 release. Methods used to implement WMM, including the Gauss coefficients for each release, are described in the following publications: Chulliat et al (2020) doi:10.25923/ytk1-yx35, Chulliat et al (2019) doi:10.25921/xhr3-0t19, Chulliat et al (2015) doi:10.7289/V5TB14V7, Maus et al (2010) https://www.ngdc.noaa.gov/geomag/WMM/data/WMMReports/WMM2010_Report.pdf, McLean et al (2004) https://www.ngdc.noaa.gov/geomag/WMM/data/WMMReports/TRWMM_2005.pdf, and Macmillian et al (2000) https://www.ngdc.noaa.gov/geomag/WMM/data/WMMReports/wmm2000.pdf.

View Documentation
phylocomr
CRAN

Interface to Phylocom

Luna Luisa Sanchez Reyes
Description

Interface to Phylocom (https://phylodiversity.net/phylocom/), a library for analysis of phylogenetic community structure and character evolution. Includes low level methods for interacting with the three executables, as well as higher level interfaces for methods like aot, ecovolve, bladj, phylomatic, and more.

View Documentation
Scientific use cases
  1. Perez, T. M., & Feeley, K. J. (2020). Weak phylogenetic and climatic signals in plant heat tolerance. Journal of Biogeography. https://doi.org/10.1111/jbi.13984
  2. Perez, T. M., Socha, A., Tserej, O., & Feeley, K. J. (2021). Photosystem II heat tolerances characterize thermal generalists and the upper limit of carbon assimilation. Plant, Cell & Environment. https://doi.org/10.1111/pce.13990
coder
CRAN

Deterministic Categorization of Items Based on External Code Data

Erik Bulow
Description

Fast categorization of items based on external code data identified by regular expressions. A typical use case considers patient with medically coded data, such as codes from the International Classification of Diseases (ICD) or the Anatomic Therapeutic Chemical (ATC) classification system. Functions of the package relies on a triad of objects: (1) case data with unit id:s and possible dates of interest; (2) external code data for corresponding units in (1) and with optional dates of interest and; (3) a classification scheme (classcodes object) with regular expressions to identify and categorize relevant codes from (2). It is easy to introduce new classification schemes (classcodes objects) or to use default schemes included in the package. Use cases includes patient categorization based on comorbidity indices such as Charlson, Elixhauser, RxRisk V, or the comorbidity-polypharmacy score (CPS), as well as adverse events after hip and knee replacement surgery.

View Documentation

Wrangle, Analyze, and Visualize Animal Movement Data

Vikram B. Baliga
Description

Tools to import, clean, and visualize movement data, particularly from motion capture systems such as Optitracks Motive, the Straw Labs Flydra, or from other sources. We provide functions to remove artifacts, standardize tunnel position and tunnel axes, select a region of interest, isolate specific trajectories, fill gaps in trajectory data, and calculate 3D and per-axis velocity. For experiments of visual guidance, we also provide functions that use subject position to estimate perception of visual stimuli.

View Documentation

Polyhedra Database

Alejandro Baranek
Description

A polyhedra database scraped from various sources as R6 objects and rgl visualizing capabilities.

View Documentation
qcoder

Lightweight Qualitative Coding

Elin Waring
Description

A free, lightweight, open source option for analyzing text-based qualitative data. Enables analysis of interview transcripts, observation notes, memos, and other sources. Supports the work of social scientists, historians, humanists, and other researchers who use qualitative methods. Addresses the unique challenges faced in analyzing qualitative data analysis. Provides opportunities for researchers who otherwise might not develop software to build software development skills.

View Documentation

Tree Biomass Estimation at Extra-Tropical Forest Plots

Erika Gonzalez-Akre
Description

Standardize and simplify the tree biomass estimation process across globally distributed extratropical forests.

View Documentation
treedata.table
Peer-reviewed

Manipulation of Matched Phylogenies and Data using data.table

Cristian Roman-Palacios
Description

An implementation that combines trait data and a phylogenetic tree (or trees) into a single object of class treedata.table. The resulting object can be easily manipulated to simultaneously change the trait- and tree-level sampling. Currently implemented functions allow users to use a data.table syntax when performing operations on the trait dataset within the treedata.table object.

View Documentation

Ecological Metadata as Linked Data

Carl Boettiger
Description

This is a utility for transforming Ecological Metadata Language (EML) files into JSON-LD and back into EML. Doing so creates a list-based representation of EML in R, so that EML data can easily be manipulated using standard R tools. This makes this package an effective backend for other R-based tools working with EML. By abstracting away the complexity of XML Schema, developers can build around native R list objects and not have to worry about satisfying many of the additional constraints of set by the schema (such as element ordering, which is handled automatically). Additionally, the JSON-LD representation enables the use of developer-friendly JSON parsing and serialization that may facilitate the use of EML in contexts outside of R, as well as the informatics-friendly serializations such as RDF and SPARQL queries.

View Documentation

Positron Emission Tomography Time-Activity Curve Analysis

Eric Brown
Description

To facilitate the analysis of positron emission tomography (PET) time activity curve (TAC) data, and to encourage open science and replicability, this package supports data loading and analysis of multiple TAC file formats. Functions are available to analyze loaded TAC data for individual participants or in batches. Major functionality includes weighted TAC merging by region of interest (ROI), calculating models including standardized uptake value ratio (SUVR) and distribution volume ratio (DVR, Logan et al. 1996 doi:10.1097/00004647-199609000-00008), basic plotting functions and calculation of cut-off values (Aizenstein et al. 2008 doi:10.1001/archneur.65.11.1509). Please see the walkthrough vignette for a detailed overview of tacmagic functions.

View Documentation
Scientific use cases
  1. Brown, E. E., Rashidi‐Ranjbar, N., Caravaggio, F., Gerretsen, P., Pollock, B. G., … Mulsant, B. H. (2019). Brain Amyloid PET Tracer Delivery is Related to White Matter Integrity in Patients with Mild Cognitive Impairment. Journal of Neuroimaging. https://doi.org/10.1111/jon.12646

Conduct Co-Localization Analysis of Fluorescence Microscopy Images

Mahmoud Ahmed
Description

Automate the co-localization analysis of fluorescence microscopy images. Selecting regions of interest, extract pixel intensities from the image channels and calculate different co-localization statistics. The methods implemented in this package are based on Dunn et al. (2011) doi:10.1152/ajpcell.00462.2010.

View Documentation
Scientific use cases
  1. Ahmed, M., Lai, T. H., & Kim, D. R. (2019). colocr: An R package for conducting co-localization analysis on fluorescence microscopy images. https://doi.org/10.7287/peerj.preprints.27613v1
  2. Nguyen, H. Q., Nguyen, V. D., Van Nguyen, H., & Seo, T. S. (2020). Quantification of colorimetric isothermal amplification on the smartphone and its open-source app for point-of-care pathogen detection. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-72095-3

Dendrograms for Evolutionary Analysis

Shaun Wilkinson
Description

Contains functions for developing phylogenetic trees as deeply-nested lists (“dendrogram” objects). Enables bi-directional conversion between dendrogram and “phylo” objects (see Paradis et al (2004) doi:10.1093/bioinformatics/btg412), and features several tools for command-line tree manipulation and import/export via Newick parenthetic text.

View Documentation
Scientific use cases
  1. Sawa, T., Momiyama, K., Mihara, T., Kainuma, A., Kinoshita, M., & Moriyama, K. (2020). Molecular epidemiology of clinically high‐risk Pseudomonas aeruginosa strains: Practical overview. Microbiology and Immunology. https://doi.org/10.1111/1348-0421.12776
  2. Alvarado-Ortega, J., & Díaz-Cruz, J. A. (2021). Hastichthys totonacus sp. nov., a North American Turonian dercetid fish (Teleostei, Aulopiformes) from the Huehuetla quarry, Puebla, Mexico. Journal of South American Earth Sciences, 105, 102900. https://doi.org/10.1016/j.jsames.2020.102900

Generate Starting Trees For Combined Molecular, Morphological and Stratigraphic Data

April Wright
Description

Combine a list of taxa with a phylogeny to generate a starting tree for use in total evidence dating analyses.

View Documentation