There is an increasing set of R packages for interacting with the web from R, whether it be the low level tools to interact with the web via http (see RCurl and httr), parsing data from the web (like RJSONIO and XML), or wrappers to web APIs that provide data (like twitteR).
Most of you probably know about CRAN Task Views that aggregate information about R packages and functions on a particular subject area into a simple web page. There isn’t one for interacting with the web, so we have started drafting one on Github, and it is below.
Let us know what you think should be changed in comments below, or ideally, send a pull request on Github at the repo.
🔗
CRAN Task View: Working with data on the web
- Maintainer: Scott Chamberlain, Karthik Ram, Christopher Gandrud
- Contact: scott at ropensci.org
- Version: 2013-09-09
🔗
Introduction
This Task View contains information about using R to obtain and parse data from the web.
The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web.
If you have any comments or suggestions for additions or improvements, then please contact the maintainer.
A list of available packages and functions is presented below, grouped by the type of activity.
🔗
curl/http/ftp
- RCurl: a low level curl wrapper.
- httr: is a light wrapper around RCurl that makes many things easier, but still allows you to access the lower level functionality of RCurl.
httr has convenient http verbs: GET(), POST(), PUT(), DELETE(), PATCH(), HEAD(), BROWSE(). These wrap functions in RCurl, making them more convenient to use, though less configurable than counterparts in RCurl. The equivalent of httr’s GET() in RCurl is getForm(). Likewise, the equivalent of httr’s POST() in RCurl is postForm().
🔗
Web frameworks
RStudio recently created Shiny, which combines R, html, css, and javascript to make web applications. Related tools are available, including openCPU and Rook. However, Shiny is the most promising of these.
🔗
Parsing data from the web
- txt, csv, etc.: you can use
read.csv()
after acquiring the csv file from the web via e.g., getURL()
from RCurl. The repmis package contains a source_data
command to simplify this process, while also assigning SHA-1 hashes to uniquely identify file versions. - xml/html: the package XML by Duncan Temple-Lang contains functions for parsing xml and html, and supports xpath for searching xml (think regex for strings).
- json/json-ld: RJSONIO by Duncan Temple-Lang. Another package, rjson, does many of the same tasks which RJSONIO does.
- custom formats: Some web APIs provide custom data formats (e.g., X), which are usually modified xml or json, and handled by XML and RJSONIO, respectively.
🔗
Data sources
🔗
Ecological and evolutionary biology data
- rvertnet: A wrapper to the VertNet collections database API.
- rgbif: Interface to the Global Biodiversity Information Facility API methods
- rfishbase: A programmatic interface to fishbase.org.
- rtreebase: An R package for discovery, access and manipulation of online phylogenies
- taxize: Taxonomic information from around the web
- rfisheries: A programmatic interface to openfisheries.org
- dismo: Species distribution modeling, with wrappers to some APIs. vignette
Not on CRAN
- rnbn: Access to the UK National Biodiversity Network data
- rWBclimate: R interface for the World Bank climate data
- rbison: Wrapper to the USGS Bison API
🔗
Genes/genomes
- cgdsr: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS). more
- rsnps: Wrapper to the openSNP data API and the Broad Institute SNP Annotation and Proxy Search.
🔗
Earth Science Data
- RNCEP: Global weather and climate data at your fingertips. more
- crn: Downloads and Builds datasets for Climate Reference Network. more
- BerkeleyEarth: Data Input for Berkeley Earth Surface Temperature. more
- waterData: An R Package for Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data. more, vignette
🔗
Economics Data
- WDI: Search, extract and format data from the World Bank’s World Development Indicators. more
- FAOSTAT: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT databasthe Food and Agricultural Organization of the United Nations more, vignette
🔗
Chemistry
- rpubchem: Interface to the PubChem Collection.
🔗
Agriculture
- cimis: R package for retrieving data from CIMIS.
🔗
Data depots and mechanical turk like things
- MTurkR: Access to Amazon Mechanical Turk Requester API via R. more
- factualR: Thin wrapper for the Factual.com server API. more
- dataone: A package that provides read/write access to data and metadata from the DataONE network of Member Node data repositories. more
- yhatr: yhatr lets you deploy, maintain, and invoke models via the Yhat REST API.
- rplos: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.
- rmetadata: Get scholarly metadata from around the web.
🔗
Other data
- dvn: Provides access to The Dataverse Network API. more
- sos4R: R client for the OGC Sensor Observation Service. more
- datamart: Unified access to various data sources.
- rDrop: Dropbox interface.
🔗
Marketing
- anametrix: Bidirectional connector to Anametrix API
- bigml: BigML, a machine learning web service more
🔗
CRAN packages:
- rfishbase
- treebase
- rdryad
- rgbif
- rplos
- RMendeley
- govdat
- OAIHarvester
- rdatamarket
- googlePublicData
- RWeather
- NCBI2
- RNCBIAxis2Libs
- RNCBIEUtilsLibs
- RNCBI
- ralastfm
- osmar
- metadata
- factualR
- rpubchem
- RTDAmeritrade
- sos4R
- SynergizeR
- twitteR