Posts with the "reproducibility" tag
Building Reproducible Data Packages with DataPackageR
September 18, 2018
Sharing data sets for collaboration or publication has always been challenging, but it’s become increasingly problematic as complex and high dimensional data sets have become ubiquitous in the life sciences. Studies are large and time consuming; data collection takes time, data analysis is a moving target, as is the software used to carry it out.
In the vaccine space (where I work) we analyze collections of high-dimensional immunological data sets from a variety of different technologies (RNA sequencing, cytometry, multiplexed antibody binding, and others).
drake's improved high-performance computing power
May 18, 2018
The drake R package is not only a reproducible research solution, but also a serious high-performance computing engine. The package website introduces drake, and this technical note draws from the guides on high-performance computing and timing in the drake manual.
You can help! Some of these features are brand new, and others are newly refactored. The GitHub version has all the advertised functionality, but it needs more testing and development before I can submit it to CRAN in good conscience.
treeio: Phylogenetic data integration
May 17, 2018
Phylogenetic trees are commonly used to present evolutionary relationships of species. Newick is the de facto format in phylogenetic for representing tree(s). Nexus format incorporates Newick tree text with related information organized into separated units known as blocks. For the R community, we have ape and phylobase packages to import trees from Newick and Nexus formats. However, analysis results (tree + analysis findings) from widely used software packages in this field are not well supported.
The prequel to the drake R package
February 6, 2018
The drake R package is a pipeline toolkit. It manages data science workflows, saves time, and adds more confidence to reproducibility. I hope it will impact the landscapes of reproducible research and high-performance computing, but I originally created it for different reasons. This post is the prequel to drake’s inception. There was struggle, and drake was the answer.
Dissertation frustration Sisyphus. https://sites.google.com/site/sisyphusa/
My dissertation project was intense. The final computational challenge was to analyze multiple genomics datasets using an emerging method and its competitors.
Data validation with the assertr package
April 11, 2017
This is cross-posted from Tony's blog onthelambda.com Version 2.0 of my data set validation package assertr hit CRAN just this weekend. It has some pretty great improvements over version 1. For those new to the package, what follows is a short and new introduction. For those who are already using assertr, the text below will point out the improvements.
I can (and have) go on and on about the treachery of messy/bad datasets.