rOpenSci | Blog

All posts (Page 112 of 131)

High Performance CommonMark and Github Markdown Rendering in R

This week the folks at Github have open sourced their fork of libcmark (based on the extensive PR by Mathieu Duponchelle), which they use to render markdown text within documents, issues, comments and anything else on the Github website. The new release of the commonmark R package incorporates this library so that we can take advantage of Github quality markdown rendering in R.

The most exciting change is that the library has gained an extension mechanism to provide optional rendering features which are missing from the commonmark spec. Most notably, Github has added extentions for rendering GFM style tables and autolinks, both very useful features for R users.

...

The rOpenSci geospatial suite

Geospatial data - data embedded in a spatial context - is used across disciplines, whether it be history, biology, business, tech, public health, etc. Along with community contributors, we’re working on a suite of tools to make working with spatial data in R as easy as possible.

If you’re not familiar with geospatial tools, it’s helpful to see what people do with them in the real world.

Example 1

One of our geospatial packages, geonames, is used for geocoding, the practice of either sorting out place names from geographic data, or vice versa. geonames interfaces with the open database of the same name: https://www.geonames.org/. A recent paper in PlosONE highlights a common use case. Harsch & HilleRisLambers1 asked how plant species distributions have shifted due to climate warming. They used the GNsrtm3() function in geonames, which uses Shuttle Radar Topography Mission elevation data, to fill in missing or incorrect elevation values in their dataset.

...

fauxpas - HTTP conditions package

HTTP, or Hypertext Transfer Protocol is a protocol by which most of us interact with the web. When we do requests to a website in a browser on desktop or mobile, or get some data from a server in R, all of that is using HTTP.

HTTP has a rich suite of status codes describing different HTTP conditions, ranging from Success to various client errors, to server errors. R has a few HTTP client libraries - crul, curl, httr, and RCurl - each of which is slightly different. I thought it would be nice if there was a single way to do HTTP exception handling across these libraries.

...

The new Tesseract package: High Quality OCR in R

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form.

People looking to extract text and metadata from pdf files in R should try our pdftools package.

...

crul - an HTTP client

A new package crul is on CRAN. crul is another HTTP client for R, but is relatively simplified compared to httr, and is being built to link closely with webmockr and vcr. webmockr and vcr are packages ported from Ruby’s webmock and vcr, respectively. They both make mocking HTTP requests really easy.

A major use case for mocking HTTP requests is for unit tests. Nearly all the packages I work on personally make HTTP requests in their test suites, so I wanted to make it really easy to mock HTTP requests. You don’t have to use mocking in test suites of course.

...

Working together to push science forward

Happy rOpenSci users can be found at