Posts with the "archiving" tag

Lessons Learned from rtika, a Digital Babel Fish

April 25, 2018

By:   Sasha Goodman

The Apache Tika parser is like the Babel fish in Douglas Adam’s book, “The Hitchhikers’ Guide to the Galaxy” 1. The Babel fish translates any natural language to any other. Although Tika does not yet translate natural language, it starts to tame the tower of babel of digital document formats. As the Babel fish allowed a person to understand Vogon poetry, Tika allows an analyst to extract text and objects from Microsoft Word.

The challenge of combining 176 x #otherpeoplesdata to create the Biomass And Allometry Database

June 3, 2015

By:   Daniel Falster  |   Rich FitzJohn  |   Remko Duursma  |   Diego Barneche

Despite the hype around “big data”, a more immediate problem facing many scientific analyses is that large-scale databases must be assembled from a collection of small independent and heterogeneous fragments – the outputs of many and isolated scientific studies conducted around the globe. Collecting and compiling these fragments is challenging at both political and technical levels. The political challenge is to manage the carrots and sticks needed to promote sharing of data within the scientific community.

Reproducible research is still a challenge

June 9, 2014

By:   Rich FitzJohn  |   Matt Pennell  |   Amy Zanne  |   Will Cornwell

Science is reportedly in the middle of a reproducibility crisis. Reproducibility seems laudable and is frequently called for (e.g., nature and science). In general the argument is that research that can be independently reproduced is more reliable than research that cannot be independently reproduced. It is also worth noting that reproducing research is not solely a checking process, and it can provide useful jumping-off points for future research questions. It is difficult to find a counter-argument to these claims, but arguing that reproducibility is laudable in general glosses over the fact that for each research group it is a significant amount of work to make their research (easily) reproducible for independent scientists.

dvn - Sharing Reproducible Research from R

February 20, 2014

By:   Thomas Leeper

Reproducible research involves the careful, annotated preservation of data, analysis code, and associated files, such that statistical procedures, output, and published results can be directly and fully replicated. As the push for reproducible research has grown, the R community has responded with an increasingly large set of tools for engaging in reproducible research practices (see, for example, the ReproducibleResearch Task View on CRAN). Most of these tools focus on improving one’s own workflow through closer integration of data analysis and report generation.

Page 1 of 1