rOpenSci | Blog

All posts (Page 103 of 121)

The new Tesseract package: High Quality OCR in R

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form.

People looking to extract text and metadata from pdf files in R should try our pdftools package.

...

crul - an HTTP client

A new package crul is on CRAN. crul is another HTTP client for R, but is relatively simplified compared to httr, and is being built to link closely with webmockr and vcr. webmockr and vcr are packages ported from Ruby’s webmock and vcr, respectively. They both make mocking HTTP requests really easy.

A major use case for mocking HTTP requests is for unit tests. Nearly all the packages I work on personally make HTTP requests in their test suites, so I wanted to make it really easy to mock HTTP requests. You don’t have to use mocking in test suites of course.

...

Chat with the rOpenSci team at upcoming meetings

You can find members of the rOpenSci team at various meetings and workshops around the world. Come say ‘hi’, learn about how our packages can enable your research, or about our onboarding process for contributing new packages, discuss software sustainability or tell us how we can help you do open and reproducible research....

Parse NOAA Integrated Surface Data Files

A new package isdparser is on CRAN. isdparser was in part liberated from rnoaa, then improved. We’ll use isdparser in rnoaa soon.

isdparser does not download files for you from NOAA’s ftp servers. The package focuses on parsing the files, which are variable length ASCII strings stored line by line, where each line has some mandatory data, and any amount of optional data.

The data is great, and includes for example, wind speed and direction, temperature, cloud data, sea level pressure, and more. Includes data from approximately 35,000 stations worldwide, though best coverage is in North America/Europe/Australia. Data go all the way back to 1901, and are updated daily.

...

Community Call v12 - How do I create a code of conduct for my event/lab/codebase?

In order to facilitate a transformation towards open and reproducible research, rOpenSci is building and improving not only the technical infrastructure, but the social infrastructure as well. To support this, occasionally a Community Call will focus on a topic that reflects the values of rOpenSci. The first of these, on Thursday, December 15th, 8-9 AM PST, will be on “How do I create a code of conduct for my event/lab/codebase?”....

Working together to push science forward

Happy rOpenSci users can be found at