Monday, April 3, 2023 From rOpenSci (https://ropensci.org/blog/2023/04/03/cran-to-git/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
Last month we explained how r-universe makes it easy to search and browse through the countless R packages, articles, and datasets to let you discover and learn new things. We are continuously growing this database by adding more R projects, to guide you through everything the R ecosystem has to offer.
Currently r-universe is tracking and indexing of over 18.000 R packages. These are a mix of packages found on popular networks like CRAN or Bioconductor, and packages that were registered by users.
In previous posts we already explained how to create your personal CRAN-like repository and publish packages on r-universe yourself. This post explains the other part: how the scraper automatically finds packages on CRAN and Bioconductor that should be included in r-universe.
For R packages to be trackable by r-universe, the source has to be publicly accessible via Git1. Most packages in r-universe are found on GitHub, but in fact any Git server is allowed.
We strongly prefer tracking projects from their official upstream Git source, where the authors commit changes and where users report bugs. The Git source provides a lot of useful information such as:
R-universe automatically analyses all this information, uses it to rank and classify packages, and presents the data via the r-universe.dev web user interfaces and APIs. For this reason we really want to know the official Git url and owner, even when a copy of the package exists on CRAN or BioConductor.
For all R packages on CRAN and BioConductor we perform the following steps to try to find the upstream git source url:
BugReportsfield in the DESCRIPTION file to look for a github/gitlab/bitbucket/r-forge url. If the package can be found here, this is the preferred method.
This list of package URLs is updated every night and published in crantogit. Today’s statistics are:
Currently we do not process CRAN/Bioc packages that have no public Git source, and also the maintainer has no Github account, because we cannot determine the owner (and hence r-universe subdomain).
This is roughly how it works, but there are some caveats. For example, the scraper may not be able to find a package if it is stored in an unusual subdirectory within a Git repository. Also, CRAN has an unusual practice of unpredictably archiving and unarchiving packages. Therefore, packages that get archived on CRAN and are also not part of any other registry, still remain on r-universe for 2 months.
If you maintain an R package, regardless of where you publish it, I highly recommend these two things to let us (and others) identify the official source and maintainer of the project:
BugReportsfields in the package DESCRIPTION file when you publish on CRAN/BioConductor2. This makes it clear where to report bugs, and also prevents confusion about the official source if someone forks your package, or creates a package with the same name.
Finally I want to emphasize again that packages do not need to be on CRAN or Bioconductor to be included in r-universe. It is super easy to setup your own universe and get the same benefits!
One notable exception is r-forge which uses SVN, but has a live Git mirror on github.com/r-forge ↩︎
You can do it manually or by running