A bit more than one year ago, rOpenSci launched its new website design, by the designer Maru Lango. Not only did the website appearance change (for the better!), but the underlying framework too. ropensci.org is powered by Hugo, like blogdown
! Over the last few months, we’ve made the best of this framework, hopefully improving your browsing experience (and trapping you into binge reading). In this note, we’ll go over the main developments, as well as give some Hugo tips....
In late November 2018, we ran the third annual rOpenSci ozunconf. This is the sibling rOpenSci unconference, held in Australia. We ran the first ozunconf in Brisbane in 2016, and the second in Melbourne in 2017.
Photos taken by Ajay from Fotoholics
As usual, before the unconf, we started discussion on GitHub issue threads, and the excitement was building with the number of issues.
The day before the unconf we ran “Day 0 training” - an afternoon explaining R packages and GitHub. This aimed to show people how to create an R package, set it up with version control with git, then put it on GitHub, and share it with others. The idea behind delivering this course was not necessarily to have people become experts in R package development and GitHub. Instead, it aimed to gently introduce the ideas and concepts of R packages and GitHub, so that people can hit the ground running over the next two days.
...We have released updates for the rOpenSci text analysis tools. This technote will highlight some of the major improvements in the spelling package and also the underlying hunspell package, which provides the spelling engine for the spelling package.
install.packages("spelling")
Update to the latest versions to use these cool new features!
Automatic Checking of README and NEWS files
Users that are already using spelling on their packages might discover a few new typos! The new version of spelling now also checks the readme.md
and news.md
and index.md
files in your package root directory.
The Ecology Hackathon
Almost one year ago now, ecologists filled a room for the “Ecology Hackathon: Developing R Packages for Accessing, Synthesizing and Analyzing Ecological Data” that was co-organised by rOpenSci Fellow, Nick Golding and Methods in Ecology and Evolution. This hackathon was part of the “Ecology Across Borders” Joint Annual Meeting 2017 of BES, GfÖ, NecoV, and EEF in Ghent. At different tables, different people joined each other to work on different ideas to implement as R packages. At our table, we were around ten people that more or less did not know anything about what we aimed for. We barely knew each other and nobody had clear expectations, just the desire of learning more about R packages. We were interested in a common idea posted as a wishlist in the rOpenSci community: building an R package to interact with CITES and its Speciesplus database. CITES (the Convention on International Trade in Endangered Species of Wild Fauna and Flora) is an international agreement between governments and provides key information to ensure that international trade in specimens of wild animals and plants does not threaten their survival. At 10 am, nobody had a clear idea on where to start. By 6 pm, we had a functional prototype of the rcites
package, which was really rewarding and gave motivation to follow up on the package development. We did great team-work, met new researchers, and learned a bunch of new stuff. This was definitely a successful hackathon!
A new version of pdftools has been released to CRAN. Go get it while it’s hot:
install.packages("pdftools")
This version has two major improvements: low level text extraction and encoding improvements.
About PDF textboxes
A pdf document may seem to contain paragraphs or tables in a viewer, but this is not actually true. PDF is a printing format: a page consists of a series of unrelated lines, bitmaps, and textboxes with a given size, position and content. Hence a table in a pdf file is really just a large unordered set of lines and words that are nicely visually positioned. This makes sense for printing, but makes extracting text or data from a pdf file extremely difficult.
...