Image processing is one of the core focus areas of rOpenSci. Over the last few months we have released several major upgrades to core packages in our imaging suite, including magick, tesseract, and av. This post highlights a few cool new features.
Magick 2.2
The magick package is one of the most powerful packages for image processing in R. It interfaces to the ImageMagick C++ API and can takes advantage of several other R packages providing imaging functionality in R. Version 2.1 and 2.2 include a lot of small fixes and new features: the NEWS file has the full list of changes.
...citecorp is a new (hit CRAN in late August) R package for working with data from the OpenCitations Corpus (OCC). OpenCitations, run by David Shotton and Silvio Peroni, houses the OCC, an open repository of scholarly citation data under the very open CC0 license. The I4OC (Initiative for Open Citations) is a collaboration between many parties, with the aim of promoting “unrestricted availability of scholarly citation data”. Citation data is available through Crossref, and available in R via our packages rcrossref, fulltext and crminer. Citation data is also available via the OCC; and this OCC data is now available in R through the new package citecorp....
The UCSC Xena platform provides an unprecedented resource for public omics data from big projects like The Cancer Genome Atlas (TCGA), however, it is hard for users to incorporate multiple datasets or data types, integrate the selected data with popular analysis tools or homebrewed code, and reproduce analysis procedures. To address this issue, we developed an R package UCSCXenaTools for enabling data retrieval, analysis integration and reproducible research for omics data from the UCSC Xena platform1....
Teaching collaborative software development
In the University of British Columbia’s Master of Data Science program one of the courses we teach is called Collaborative Software Development, DSCI 524. In this course we focus on teaching how to exploit practices from collaborative software development techniques in data scientific workflows. This includes appropriate use of the software life cycle, unit testing and continuous integration, as well as packaging code for use by others.
...The free online book Open Forensic Science in R was created to foster open science practices in the forensic science community. It is comprised of eight chapters: an introduction and seven chapters covering different areas of forensic science: the validation of DNA interpretation systems, firearms analysis of bullets and casings, latent fingerprints, shoe outsole impressions, trace glass evidence, and decision-making in forensic identification tasks. The chapters of Open Forensic Science in R have the same five sections: Introduction, Data, R Package(s), Drawing Conclusions, and Case Study. There is R code throughout the chapter to guide the reader along in an analysis, and the case study walks the reader through solving a forensic science problem in R, from reading the data to answering a specific question such as, “Were these two bullets fired by the same gun?”...