rOpenSci | HTTP tools

HTTP tools

Interact with Web Resources
Showing 10 of 12

A GraphQL Query Parser

Jeroen Ooms
Description

Bindings to the libgraphqlparser C++ library. Parses GraphQL syntax and exports the AST in JSON format.

View Documentation

A robots.txt Parser and Webbot/Spider/Crawler Permissions Checker

Jordan Bradford
Description

Provides functions to download and parse robots.txt files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, …) are allowed to access specific resources on a domain.

View Documentation
Scientific use cases
  1. Dogucu, M., & Çetinkaya-Rundel, M. (2020). Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities. Journal of Statistics Education, 1–11. https://doi.org/10.1080/10691898.2020.1787116
vcr
CRAN

Record HTTP Calls to Disk

Scott Chamberlain
Description

Record test suite HTTP requests and replays them during future runs. A port of the Ruby gem of the same name (https://github.com/vcr/vcr/). Works by hooking into the webmockr R package for matching HTTP requests by various rules (HTTP method, URL, query parameters, headers, body, etc.), and then caching real HTTP responses on disk in cassettes. Subsequent HTTP requests matching any previous requests in the same cassette use a cached HTTP response.

View Documentation
webmockr
CRAN

Stubbing and Setting Expectations on HTTP Requests

Scott Chamberlain
Description

Stubbing and setting expectations on HTTP requests. Includes tools for stubbing HTTP requests, including expected request conditions and response conditions. Match on HTTP method, query parameters, request body, headers and more. Can be used for unit tests or outside of a testing context.

View Documentation
crul
CRAN

HTTP Client

Scott Chamberlain
Description

A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Rubys faraday gem (https://rubygems.org/gems/faraday). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package curl, an interface to libcurl’ (https://curl.se/libcurl/).

View Documentation
wdman
CRAN

Webdriver/Selenium Binary Manager

Jonathan Völkle
Description

There are a number of binary files associated with the Webdriver/Selenium project. This package provides functions to download these binaries and to manage processes involving them.

View Documentation
binman
CRAN

A Binary Download Manager

Jonathan Völkle
Description

Tools and functions for managing the download of binary files. Binary repositories are defined in YAML format. Defining new pre-download, download and post-download templates allow additional repositories to be added.

View Documentation
RSelenium
CRAN

R Bindings for Selenium WebDriver

Jonathan Völkle
Description

Provides a set of R bindings for the Selenium 2.0 WebDriver (see https://www.selenium.dev/documentation/ for more information) using the JsonWireProtocol (see https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol for more information). Selenium 2.0 WebDriver allows driving a web browser natively as a user would either locally or on a remote machine using the Selenium server it marks a leap forward in terms of web browser automation. Selenium automates web browsers (commonly referred to as browsers). Using RSelenium you can automate browsers locally or remotely.

View Documentation
Scientific use cases
  1. Silva, D., Meireles, F. (2015). Ciência Política na era do Big Data: automação na coleta de dados digitais. Politica Hoje, v.2, (pp. 87-102) https://github.com/meirelesff/meirelesff.github.io/raw/master/files/bigdata2016.pdf
  2. Nousiainen, K., Kanduri, K., Ricaño-Ponce, I., Wijmenga, C., Lahesmaa, R., Kumar, V., & Lähdesmäki, H. (2018). snpEnrichR: analyzing co-localization of SNPs and their proxies in genomic regions. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty460
  3. Blankers, M., van der Gouwe, D., & van Laar, M. (2019). 4-Fluoramphetamine in the Netherlands: Text-mining and sentiment analysis of internet forums. International Journal of Drug Policy, 64, 34–39. https://doi.org/10.1016/j.drugpo.2018.11.016
  4. Krah, F.-S., Bates, S., & Miller, A. (2019). rMyCoPortal - an R package to interface with the Mycology Collections Portal. Biodiversity Data Journal, 7. https://doi.org/10.3897/bdj.7.e31511
  5. Lee, A. J., Jones, B. C., & DeBruine, L. M. (2019, January 21). Investigating the association between mating-relevant self-concepts and mate preferences through a data-driven analysis of online personal descriptions. https://doi.org/10.31234/osf.io/38zef
  6. Mitchell, J. M., & Moseley, H. N. B. (2019). Deriving Accurate Lipid Classification based on Molecular Formula. https://doi.org/10.1101/572883
  7. Rybinski, K. 2019. A machine learning framework for automated analysis of central bank communication and media discourse. The case of Narodowy Bank Polski. Bank & Credit. 50(1): 1-20. http://bankikredyt.nbp.pl/content/2019/01/BIK_01_2019_01.pdf
  8. Fioravanti, G., Piervitali, E., & Desiato, F. (2019). A new homogenized daily data set for temperature variability assessment in Italy. International Journal of Climatology. https://doi.org/10.1002/joc.6177
  9. Roh, T., Jeong, Y., Jang, H., & Yoon, B. (2019). Technology opportunity discovery by structuring user needs based on natural language processing and machine learning. PLOS ONE, 14(10), e0223404. https://doi.org/10.1371/journal.pone.0223404
  10. Nüst, D., Eddelbuettel, D., Bennett, D., Cannoodt, R., Clark, D., Daroczi, G., … & Marwick, B. (2020). The Rockerverse: Packages and Applications for Containerization with R. arXiv preprint arXiv:2001.10641 https://arxiv.org/pdf/2001.10641.pdf
  11. Salgado, D., & Oancea, B. (2020). On new data sources for the production of official statistics. arXiv preprint https://arxiv.org/pdf/2003.06797.pdf
  12. Fraser, N., Momeni, F., Mayr, P., & Peters, I. (2020). The relationship between bioRxiv preprints, citations and altmetrics. Quantitative Science Studies, 1–21. https://doi.org/10.1162/qss_a_00043
  13. Hannon, B. A., Fairfield, W. D., Adams, B., Kyle, T., Crow, M., & Thomas, D. M. (2020). Use and abuse of dietary supplements in persons with diabetes. Nutrition & Diabetes, 10(1). https://doi.org/10.1038/s41387-020-0117-6
  14. Stringham, O., Toomes, A., Kanishka, A. M., Mitchell, L., Heinrich, S., Ross, J. V., & Cassey, P. (2020). A guide to using the Internet to monitor and quantify the wildlife trade. https://ecoevorxiv.org/5yzw9/download?format=pdf
  15. Bisbee, J., & Honig, D. (2020). Flight to Safety: 2020 Democratic Primary Election Results and COVID-19. Covid Economics, 3(10), 54-84. http://www.amcham-egypt.org/bic/pdf/corona1/Covid%20Economics%20by%20CEPR.pdf
  16. Göbel, S. 2020. Voting and Social Media-Based Political Participation. https://doi.org/10.31235/osf.io/sjq4g
  17. Mancosu, M., & Vegetti, F. (2020). What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Social Media + Society, 6(3), 205630512094070. https://doi.org/10.1177/2056305120940703
  18. Gessa, A., Jiménez, A., & Sancha, P. (2020). Open Innovation in Digital Healthcare: Users’ Discrimination between Certified and Non-Certified mHealth Applications. Journal of Open Innovation: Technology, Market, and Complexity, 6(4), 130. https://doi.org/10.3390/joitmc6040130
  19. Simpson, R. B., Gottlieb, J., Zhou, B., Hartwick, M. A., & Naumova, E. N. (2021). Completeness of open access FluNet influenza surveillance data for Pan-America in 2005–2019. Scientific Reports, 11(1). doi:10.1038/s41598-020-80842-9
ghql
CRAN

General Purpose GraphQL Client

Mark Padgham
Description

A GraphQL client, with an R6 interface for initializing a connection to a GraphQL instance, and methods for constructing queries, including fragments and parameterized queries. Queries are checked with the libgraphqlparser C++ parser via the graphql package.

View Documentation