Tuesday, August 1, 2017 From rOpenSci (https://ropensci.org/blog/2017/08/01/emldown/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
How do you get the maximum value out of a dataset? Data is most valuable when it can easily be shared, understood, and used by others. This requires some form of metadata that describes the data. While metadata can take many forms, the most useful metadata is that which follows a standardized specification. The Ecological Metadata Language (EML) is an example of such a specification originally developed for ecological datasets. EML describes what information should be included to describe the data, and what format that information should be represented in.
There are many benefits to writing metadata according to a specification like EML:
Despite these benefits, it’s not always fun to write standardized metadata. Although there’s a very good package for helping you create the EML, rOpenSci’s EML package, documenting the data can be quite tedious. Furthermore, before you share the data on a public repository that enforces EML, the only prize you get is a happy conscience, which isn’t very tangible. In our unconf project, we created immediate gratification for EML users: a package that transforms the non-human readable EML file into a pretty documentation website for any dataset!
As we mentioned above, EML is a metadata standard originally created for the ecologists. In practice it’s a set of XML schema documents, telling you what you need to document (e.g. the dataset creator, geographic coverage of the data, etc.) and how to build those documents (format of the XML). The
EML R package provides tools to create and read EML without needing to learn about XML thanks to helper functions and extensive documentation. Although its name contains the word ecological, one can use EML for documenting datasets from any field. One of our team members uses it for documenting epidemiological datasets, because everything she wants to document is present in the standard.
After creating EML documentation for your data, you get an XML document that’s, well… not very human readable.
EML package there’s a function called
eml_view relying on the
listviewer package to produce an interactive view of the XML in the Viewer pane of say, RStudio, which allows one to check some things quickly but which is far from being a user-friendly representation of the metadata. Our goal with
emldown was to bridge this gap and provide a quick and easy way to take EML and convert it to a more user-friendly web page.
After installing the package from Github, you can apply it to your EML…
devtools::install_github("ropenscilabs/emldown") library("emldown") render_eml(path_to_my_eml)
and you get something like this!
This format is much more likely to make you and your collaborators happy because it’s more engaging and easier to explore to find useful information. Note how little effort you needed to invest into making it! Viewing your metadata in this way makes it easy to read, and easy to spot any errors.
The resulting website is based on Bootstrap and has some interactive components:
Geographic information turns into a map, made using leaflet:
Right now, we are able to capture some of the most common parts of Ecological Metadata Language, including the Title, Abstract, Authors, Keywords, Coverage (where in space and time the samples were taken), the Data Tables and Units associated with these. Over time we plan to add support for additional components of the EML specification.
You can see a live example of a website created with
Use the package to transform the EML you have lying around on your PC into a pretty website, and if you find a bug while doing so we’ll be happy to tackle it! Report any issue or feature request here, or feel free to contribute with code.
We’re very happy to have been able to create this working package in two days (and thankful for that opportunity, thanks rOpenSci!); and we not-so-secretly hope that it will contribute to making writing good metadata more attractive and therefore more common.