Posts with the "ocr" tag

Tesseract 4 is here! State of the art OCR in R!

November 6, 2018

By:   Jeroen Ooms

Last week Google and friends released the new major version of their OCR system: Tesseract 4. This release builds upon 2+ years of hard work and has completely overhauled the internal OCR engine. From the tesseract wiki: Tesseract 4.0 includes a new neural network-based recognition engine that delivers significantly higher accuracy (on document images) than the previous versions, in return for a significant increase in required compute power. On complex languages however, it may actually be faster than base Tesseract.

Community Call - Working with images in R

October 24, 2018

By:   Stefanie Butland

rOpenSci’s software engineer / postdoc Jeroen Ooms will explain what images are, under the hood, and showcase several rOpenSci packages that form a modern toolkit for working with images in R, including opencv, av, tesseract, magick and pdftools. 🕘 Thursday, November 15, 2018, 10-11AM PST; 7-8PM CET (find your timezone) ☎️ Find all details for joining the call on our Community Calls page. Everyone is welcome. No RSVP needed.

Support for hOCR and Tesseract 4 in R

February 14, 2018

By:   Jeroen Ooms

Earlier this month we released a new version of the tesseract package to CRAN. This package provides R bindings to Google’s open source optical character recognition (OCR) engine Tesseract. Two major new features are support for HOCR and support for the upcoming Tesseract 4. hOCR output Support for HOCR output was requested by one of our users on Github. The ocr() function gains a parameter HOCR which allows for returning results in hOCR format:

Tesseract and Magick: High Quality OCR in R

August 17, 2017

By:   Jeroen Ooms

Last week we released an update of the tesseract package to CRAN. This package provides R bindings to Google’s OCR library Tesseract. install.packages("tesseract") The new version ships with the latest libtesseract 3.05.01 on Windows and MacOS. Furthermore it includes enhancements for managing language data and using tesseract together with the magick package. Installing Language Data The new version has several improvements for installing additional language data. On Windows and MacOS you use the tesseract_download() function to install additional languages:

Page 1 of 1