Ambitious workflows in R, such as machine learning analyses, can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the drake R package. drake resolves the dependency structure of your analysis pipeline, skips tasks that are already up to date, executes the rest with optional distributed computing, and organizes the output so you rarely have to think about data files. This talk demonstrates how to create and maintain a realistic machine learning project using drake-powered automation....
Are you passionate about statistical methods and software? If so we would love for you to join our team to dig deep into the world of statistical software packages. You’ll develop standards for evaluating and reviewing statistical tools, publish, and work closely with an international team of experts to set up a new software review system.
We are seeking a creative, dedicated, and collaborative software research scientist to support a two-year project in launching a new software peer-review initiative. The software research scientist will work on the Sloan Foundation supported rOpenSci project, with rOpenSci staff and a statistical methods editorial board. They will research and develop standards and review guidelines for statistical software, publish findings, and develop R software to test packages against those standards. The software research scientist will work with staff and the board to collaborate broadly with the statistical and software communities to gather input, refine and promote the standards, and recruit editors and peer reviewers. The candidate must be self-motivated, proactive, collaborative and comfortable working openly and reproducibly with a broad online community.
...The grainchanger package provides functionality for data aggregation to a coarser resolution via moving-window or direct methods.
Why do we need new methods for data aggregation?
As landscape ecologists and macroecologists, we often need to aggregate data in order to harmonise datasets. In doing so, we often lose a lot of information about the spatial structure and environmental heterogeneity of data measured at finer resolution.
The issues around scale disconnects are both conceptual and practical:
...We’re delighted to announce that we have received new funding from the Alfred P. Sloan Foundation. The $678K grant, awarded through the Foundation’s Data & Computational Research program, will be used to expand our efforts in software peer review.
Software peer review has become a core part of rOpenSci, helping improve scientific software quality, drive best engineering practices into scientific communities, and building community and collaboration through open, constructive reviews. We’re excited to expand our work in this important area. Here’s what we’ll be doing in the next two years with this support:
...Our 1-hour Call on Reproducible Research with R will include three speakers and 20 minutes for Q & A.
Ben Marwick will introduce you to a research compendium, which accompanies, enhances, or is a scientific publication providing data, code, and documentation for reproducing a scientific workflow.
From Karthik Ram you will learn about holepunch, an R package that will take any GitHub repo with R scripts and R markdown files and quickly turn it in into a free, live RStudio server where anyone can run your code!
...