Posts with the "reproducibility" tag

Building Reproducible Data Packages with DataPackageR

September 18, 2018

By:   Greg Finak

Sharing data sets for collaboration or publication has always been challenging, but it’s become increasingly problematic as complex and high dimensional data sets have become ubiquitous in the life sciences. Studies are large and time consuming; data collection takes time, data analysis is a moving target, as is the software used to carry it out. In the vaccine space (where I work) we analyze collections of high-dimensional immunological data sets from a variety of different technologies (RNA sequencing, cytometry, multiplexed antibody binding, and others).

drake's improved high-performance computing power

May 18, 2018

By:   Will Landau

The drake R package is not only a reproducible research solution, but also a serious high-performance computing engine. The package website introduces drake, and this technical note draws from the guides on high-performance computing and timing in the drake manual. You can help! Some of these features are brand new, and others are newly refactored. The GitHub version has all the advertised functionality, but it needs more testing and development before I can submit it to CRAN in good conscience.

treeio: Phylogenetic data integration

May 17, 2018

By:   Guangchuang Yu

Phylogenetic trees are commonly used to present evolutionary relationships of species. Newick is the de facto format in phylogenetic for representing tree(s). Nexus format incorporates Newick tree text with related information organized into separated units known as blocks. For the R community, we have ape and phylobase packages to import trees from Newick and Nexus formats. However, analysis results (tree + analysis findings) from widely used software packages in this field are not well supported.

The prequel to the drake R package

February 6, 2018

By:   Will Landau

The drake R package is a pipeline toolkit. It manages data science workflows, saves time, and adds more confidence to reproducibility. I hope it will impact the landscapes of reproducible research and high-performance computing, but I originally created it for different reasons. This post is the prequel to drake’s inception. There was struggle, and drake was the answer. Dissertation frustration Sisyphus. https://sites.google.com/site/sisyphusa/ My dissertation project was intense. The final computational challenge was to analyze multiple genomics datasets using an emerging method and its competitors.

Data validation with the assertr package

April 11, 2017

By:   Tony Fischetti

This is cross-posted from Tony's blog onthelambda.com Version 2.0 of my data set validation package assertr hit CRAN just this weekend. It has some pretty great improvements over version 1. For those new to the package, what follows is a short and new introduction. For those who are already using assertr, the text below will point out the improvements. I can (and have) go on and on about the treachery of messy/bad datasets.

Page 1 of 1