Posts with the "reproducible research" tag
Checklist Recipe - How we created a template to standardize species data
November 20, 2018
Imagine you are a fish ecologist who compiled a list of fish species for your country. 🐟
Your list could be useful to others, so you publish it as a supplementary file to an article or in a research repository. That is fantastic, but it might be difficult for others to discover your list or combine it with other lists of species. Luckily there’s a better way to publish species lists: as a standardized checklist that can be harvested and processed by the Global Biodiversity Information Facility (GBIF).
outcomerate: Transparent Communication of Quality in Social Surveys
October 2, 2018
Background Surveys are ubiquitous in the social sciences, and the best of them are meticulously planned out. Statisticians often decide on a sample size based on a theoretical design, and then proceed to inflate this number to account for “sample losses”. This ensures that the desired sample size is achieved, even in the presence of non-response. Factors that reduce the pool of interviews include participant refusals, inability to contact respondents, deaths, and frame inaccuracies.
Building Reproducible Data Packages with DataPackageR
September 18, 2018
Sharing data sets for collaboration or publication has always been challenging, but it’s become increasingly problematic as complex and high dimensional data sets have become ubiquitous in the life sciences. Studies are large and time consuming; data collection takes time, data analysis is a moving target, as is the software used to carry it out.
In the vaccine space (where I work) we analyze collections of high-dimensional immunological data sets from a variety of different technologies (RNA sequencing, cytometry, multiplexed antibody binding, and others).
The challenge of combining 176 x #otherpeoplesdata to create the Biomass And Allometry Database
June 3, 2015
Despite the hype around “big data”, a more immediate problem facing many scientific analyses is that large-scale databases must be assembled from a collection of small independent and heterogeneous fragments – the outputs of many and isolated scientific studies conducted around the globe.
Collecting and compiling these fragments is challenging at both political and technical levels. The political challenge is to manage the carrots and sticks needed to promote sharing of data within the scientific community.
Introducing Rocker: Docker for R
October 23, 2014
You only know two things about Docker. First, it uses Linux containers. Second, the Internet won’t shut up about it.
– attributed to Solomon Hykes, Docker CEO
So what is Docker? Docker is a relatively new open source application and service, which is seeing interest across a number of areas. It uses recent Linux kernel features (containers, namespaces) to shield processes. While its use (superficially) resembles that of virtual machines, it is much more lightweight as it operates at the level of a single process (rather than an emulation of an entire OS layer).