rOpenSci | Blog

All posts (Page 69 of 120)

The Antarctic/Southern Ocean rOpenSci community

🔗

Antarctic/Southern Ocean science and rOpenSci

Collaboration and reproducibility are fundamental to Antarctic and Southern Ocean science, and the value of data to Antarctic science has long been promoted. The Antarctic Treaty (which came into force in 1961) included the provision that scientific observations and results from Antarctica should be openly shared. The high cost and difficulty of acquisition means that data tend to be re-used for different studies once collected. Further, there are many common data requirement themes (e.g. sea ice information is useful to a wide range of activities, from voyage planning through to ecosystem modelling). Support for Antarctic data management is well established. The SCAR-COMNAP Joint Committee on Antarctic Data Management was established in 1997 and remains active as a SCAR Standing Commitee today.

...

Tesseract 4 is here! State of the art OCR in R!

Last week Google and friends released the new major version of their OCR system: Tesseract 4. This release builds upon 2+ years of hard work and has completely overhauled the internal OCR engine. From the tesseract wiki:

Tesseract 4.0 includes a new neural network-based recognition engine that delivers significantly higher accuracy (on document images) than the previous versions, in return for a significant increase in required compute power. On complex languages however, it may actually be faster than base Tesseract.

...

Sharing the Recipe for rOpenSci’s Unconf Ice Breaker

While many people groan at the thought of participating in a group ice breaker activity, we’ve gotten consistent feedback from people who have been to recent rOpenSci unconferences.

Best ice breaker ever!

We’ve had lots of requests for a detailed description of how we do it. This post shares our recipe, including a script you can adapt, a reflection on its success, examples of how others have used it, and some tips to remember. Let us know in the comments if you’ve used or adapted it!

...

Community Call - Working with images in R

rOpenSci’s software engineer / postdoc Jeroen Ooms will explain what images are, under the hood, and showcase several rOpenSci packages that form a modern toolkit for working with images in R, including opencv, av, tesseract, magick and pdftools.

🕘 Thursday, November 15, 2018, 10-11AM PST; 7-8PM CET (find your timezone)

☎️ Find all details for joining the call on our Community Calls page. Everyone is welcome. No RSVP needed.

Magick: quantize, histogram

🔗

Agenda

  1. Welcome (Stefanie Butland, rOpenSci Community Manager, 5 min)
  2. Working with images in R (Jeroen Ooms, 35 min)
  3. Q & A (20 min)

🔗

Abstract

Images in various forms are used for numerous applications across scientific disciplines. Whether you are observing through satellite or microscope, looking at MRI scans or petri dishes, trying to find patterns or abnormalities, the data is in the image. Unfortunately the tools for working with images are traditionally highly fragmented by field, and often narrow in scope. At rOpenSci we are working on a suite of general purpose packages based on powerful c/c++ libraries. These provide an extensible and interoperable foundation for working with images in R, which can be used to implement domain specific-methods. This talk gives a taste of things we can currently do with images in R, and highlights some of the ongoing developments and challenges.

...

pubchunks: extract parts of scholarly XML articles

pubchunks is a package grown out of the fulltext package. fulltext provides a single interface to many sources of full text scholarly articles. As part of the user flow in fulltext there is an extraction step where fulltext::chunks() pulls parts of articles out of XML format article files.

As part of making fulltext more maintainable and focused on simply fetching articles, and realizing that pulling out bits of structured XML files is a more general problem, we broke out pubchunks into a separate package. fulltext::ft_chunks() and fulltext::ft_tabularize() will eventually be removed and we’ll point users to pubchunks.

...

Working together to push science forward

Happy rOpenSci users can be found at