Thursday, June 16, 2022 From rOpenSci (https://ropensci.org/blog/2022/06/16/publicize-api-client-yes-no/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
These days web Application Programming Interfaces (APIs) are everywhere (scientific data sources, your system for Customer relationship management, cat facts API…). Do you need to write some R code wrapping a web resource such as an API? Packaging it up might be useful to you or your team for the same reason as any code. Now, whether you really want to publicize the package and to guarantee its maintenance might be slightly trickier than for other packages, as the usefulness and status of your package will depend on the web API being up and running according to expectations. This creates a surface for failures that might be more or less scary depending on your trust in the upstream maintainers.
In this post, we will go over whether you should bother maintain a package wrapping a web API and we will provide suggestions of useful resources.
In a world where we have great R packages to interact with internet resources (httr, httr2, curl, etc.), one might wonder if it’s worth writing an API package rather than using these packages directly:
httr2::request("https://cat-fact.herokuapp.com/facts") |>
httr2::req_perform() |>
httr2::resp_body_json() |>
purrr::map_chr("text")
#> [1] "Cats make about 100 different sounds. Dogs make only about 10."
#> [2] "Domestic cats spend about 70 percent of the day sleeping and 15 percent of the day grooming."
#> [3] "I don't know anything about cats."
#> [4] "The technical term for a cat’s hairball is a bezoar."
#> [5] "Cats are the most popular pet in the United States: There are 88 million pet cats and 74 million dogs."
However, even if not doing extremely complex things under the hood, the mere existence of your package can hugely lower the barrier to API usage to some R users.
Having to read package docs rather than web API docs can for instance lower the efforts needed.
Other aspects that your package can simplify are:
gh::gh()
function but done under the hood by gh::gh_next()
);User-Agent: "https://github.com/r-lib/gh"
), rmangal: User-Agent: "rmangal"
);A particularly tricky aspect your package can simplify is authentication. Authentication is the fact that certain APIs ask the users to identify themselves before accessing them. This can come in several flavors: using provided API keys, using OAuth, or using HTTP authentication (see https://rapidapi.com/blog/api-glossary/api-authentication for examples). Your package can both simplify it and promote security best practices! For instance, your package should not make an API key a function argument only as it would encourage writing the API key in scripts. Examples of packages simplifying authentication: gh, rtweet, opencage (whose docs encourage the use of the keyring package for storing credentials).
Your package might be even more useful if it wraps not only one, but more web APIs, providing an unified interface to different data sources. Examples: specieshindex, spocc, and taxize.
One of the most difficult aspect with API R packages (or any R package really, but even more so here) is to strike the right balance on the complexity/flexibility trade-off. In an attempt to simplify things, you might end up:
httr::http_error()
) and return a generic error message while each HTTP code has a precise meaning about what went wrong (see for example, qualtRics’ qualtrics_response_code()
function which returns custom errors in function of the HTTP error).Alternatively, in an attempt to preserve the API flexibility and to make sure you offer all options to the user, you might end up creating a package that is as complex, if not more complex than direct calls to the API.
An interesting pattern might be for your package to provide both high-level and low-level functions, where low level functions can support API calls that the high level functions do not cover (or not yet). Expert users could then use the lower-level functions to access more detailed features of the API with more flexibility, while less experienced users would be interested in using higher-level functions that have simpler outputs. The gh package only offers low level functions, that are used in for instance the incredibly useful PR helpers in usethis.
Similarly, some packages, such as rtweet (via the parse
argument), offer the option of getting either the raw output or the output parsed into rectangular data. The zbank package offers a low-level API that gives back the unparsed answer as a nested list with complete information, it also offers a high-level API that sends back parsed data.frames.
A web API might change: for instance its output could evolve, breaking your code.
Before writing an R package, you might want to assess whether the API is well maintained, and you might even want to contact API maintainers to get their blessing. Alternatively, they might advise you to wait a couple of months as major breaking changes are in the pipeline. It is also a way to see whether they are responsive (and maybe get moral support for your package). That is a step useful if the API is, say, a small scientific data source (two of us, who built rromeo discovered the day the package was approved by rOpenSci that the API released a brand new version, with no common functions from the previous version).
Now, even a big commercial API (à la Twitter, wrapped by rtweet) could bite you: pricing could change, features can get dropped, etc. Beside, commercial APIs do not necessarily offer great rates for client developers, so your involvement might depend on what subscription you have for other reasons (now, if you lose your subscription by for instance changing jobs, you might no longer be interested in maintaining a package anyway).
Getting informed in advance won’t prevent bad surprises but should still help. Keeping informed (via a changelog, a newsletter, regular manual checks, tests with real requests) might also help seeing changes early. In theory, adding a custom user-agent–a small string that goes to the server to signal yourself when you’re making a query–with contact details (link to your package development repository) might allow API maintainers to contact you if your package is causing issues or if it’s interacting with (soon to be) deprecated endpoints.
One thing to keep in mind is that no matter how many flags you plant in your documentation, users of the R package might file bug reports with your package instead of with the API maintainers. It can help if you for example implement a once per session reminder of the link to the API and its citation to clear up the confusion.
For the case when an API is flaky, that is to say is often down, you might want to add warnings to your documentation and retries to your code (see for instance httr2::req_retry()
).
Assess potential usage before spending too much effort on your package. Of course usage might depend on your efforts: your package might make a data source more accessible to users who would feel less at ease writing httr2 code themselves; and your promotion efforts might make a wider audience tap into the API your package is wrapping.
Also, maybe you might want to still develop an API package as a way to learn general package development and maintenance skills and to display these skills of yours to a wider audience.
If you make your package public but are not sure whether you want to commit to maintain it longer term, make sure the docs clearly state the repo status or lifecycle.
Could the package be developed automatically based on an OpenAPI (formerly called Swagger) specifications? Maybe, but currently there is no established tool for that so you would first need to develop the R package creating R packages. 😉
Worth exploring are:
Now, of course not all APIs have an OpenAPI specification.
More generally you might find the R-hub blog post “Code generation in R packages” relevant.
Now, if you are still motivated to maintain an API package for a small (your team?) or big (the world!) audience, we recommend:
If you need to write code wrapping an API, you can obviously always package it up and follow best practices. Now, whether to publicize and guarantee maintenance of the resulting tool is a conscious decision to make. You should also regularly re-assess your maintaining the package. Onboard contributors regularly to share the load and potentially leave completely at some point without the package getting orphaned. Good luck, we hope you get all 200
’s! Feel free to POST about your own experiences below.