rOpenSci | Make Your R Package Easier to Cite

Make Your R Package Easier to Cite

Scientists rarely cite research software they use as part of a research project. As a consequence, the software and the time spent developing and maintaining it becomes an invisible scholarly contribution. Furthermore, this lack of visibility means that incentives to produce high quality, sustainable software are missing. Among many reasons why software is not cited, one is the lack of a clear citation information from package developers. In this tech note we provide some tips on how to make it really easy to cite your software.

We shall also give some insights on hurdles the users of your package might face when wanting to cite your package, as well as a brief presentation of our monitoring of the literature to find use cases of our packages. We are planning to include this topic in our package dev guide. By commenting on this post you can help us strengthen our package citation guidance! Thanks to Adam Sparks for commenting on a draft of this post!

🔗 Clear citation rules for your package

Person wearing a cardboard box on their head

Why didn’t you cite my package?! cottonbro on Pexels.

To make it really easy for users to cite your package, you should store citation metadata in the expected places and advertise it very clearly.

  • Create and populate the CITATION file, as thousands of R packages have done (3,386 out of 16,002 packages on CRAN as of the 15th of January, 20211). It’s easy to create a boilerplate with usethis::use_citation() (update to usethis >= 2.2.0).
  • Archive each release of your GitHub repo on Zenodo and add the Zenodo top-level DOI to the CITATION file.
  • If your software has a clear research application, you can also publish a paper at the Journal of Open Source Software, Journal of Open Research Software among others. You can append a software publication to your CITATION file.
  • Less related to your package itself but to what supports it: if your package wraps a particular resource such as data source or, say, statistical algorithm, remind users of how to cite that resource via e.g. bibentry(). Maybe even add the reference for the resource?

As an example see dynamite CITATION file that refers to both the manual and a paper.

citHeader("To cite dynamite in publications use:")

bibentry(
  key = "dynamitepaper",
  bibtype  = "Misc",
  doi = "10.48550/ARXIV.2302.01607",
  url = "https://arxiv.org/abs/2302.01607",
  author = c(person("Santtu", "Tikka"), person("Jouni", "Helske")),
  title = "dynamite: An R Package for Dynamic Multivariate Panel Models",
  publisher = "arXiv",
  year = "2023"
)
bibentry(
  key = "dmpmpaper",
  bibtype  = "Misc",
  title    = "Estimating Causal Effects from Panel Data with Dynamic
    Multivariate Panel Models",
  author = c(person("Santtu", "Tikka"), person("Jouni", "Helske")),
  publisher = "SocArxiv",
  year     = "2022",
  url      = "https://osf.io/preprints/socarxiv/mdwu5/"
)

bibentry(
  key = "dynamite",
  bibtype  = "Manual",
  title    = "Bayesian Modeling and Causal Inference for Multivariate
    Longitudinal Data",
  author = c(person("Santtu", "Tikka"), person("Jouni", "Helske")),
  note  = "R package version 1.0.0",
  year     = "2022",
  url      = "https://github.com/ropensci/dynamite"
)
  • Direct potential readers to the preferred citation in the README by adding boilerplate text “here’s how to cite my package”. See e.g. ecmwfr README.

“Personally, I take a “belt-and-suspenders” approach and still put citation things in the README.” Noam Ross, rOpenSci forum

“I [advertise the citation info in the README] too, just to make it painfully obvious how to cite the work.” Adam Sparks, rOpenSci forum

Although some authors use on-load messages to encourage citations, we discourage this practice and recommend that developers highlight this information in their README and documentation.

🔗 Why is it hard to cite software?

Despite your best efforts to encourage users to cite your software, you might still run into challenges. Authors may have limits on the number of references they can cite in a journal or face resistance from their coauthors. Other authors may simply be unaware that they can use citation("packagename") to easily retrieve the citation information for an R package or that they even should cite the package because it is not something that has been commonly promoted.

These problems can’t all be fixed at once by one motivated individual (neither you the developer nor they the user), so more advocacy and teaching is needed. In the meantime, how do we adapt software citation guidelines to realistically accommodate for all situations?

🔗 How rOpenSci tracks package usage

At rOpenSci we monitor the scientific literature to discover uses and mentions of our packages that you can browse on our citations page. As sometimes packages are used but not listed in the references section, we report any usage of the packages in papers. Here’s Scott Chamberlain’s workflow.

As part of our new Moore Foundation funded effort, we are building a system to automatically detect R package citations in literature and append them to package records in R-universe. Stay tuned on our blog for more details.

🔗 Conclusion

In this post we shared guidance on how to help your R package land in the references section of the papers that used it2. We also mentioned some hurdles even well-meaning users might face, and explained how we track for usage of our packages in the scientific literature. We encourage you to share your experience and wisdom in the comments below, as we are planning to consolidate them together with our content, to add a new topic to our dev guide.


  1. Thanks to Mark Padgham for computing this in a CRAN mirror. ↩︎

  2. Note that this post is not about promoting usage of your package, which is covered briefly in the dev guide↩︎