Thursday, April 16, 2020 From rOpenSci (https://ropensci.org/blog/2020/04/16/ebird-polygon/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
This tech note about the peer-reviewed auk package was first posted on Matthew Strimas-Mackey’s website and kindly contributed to this blog. A first longer blog post from 2018 provides more context about the package.
One of the first things I took on when I started at the Cornell Lab of Ornithology was creating the auk R package for accessing eBird data. The entire eBird dataset can be downloaded as a massive text file, called the eBird Basic Dataset (EBD), and auk pulls out manageable chunks of the dataset based on various spatial, temporal, or taxonomic filters. I’m often asked “how do I extract data from within a polygon?” (usually “polygon” is replaced by “shapefile”, but I try to avoid that word since there’s good reasons to stop using shapefiles). Rather than answer these questions individually, I thought I’d do a quick post about how to do this with auk. Note that, at the time of posting, this requires auk version 0.4.1 or higher, which can be installed with:
install.packages("auk")
For more details on auk and eBird data in general, including how to get access to the EBD, it’s worth reading the first two chapters of the eBird Best Practices book. For the sake of speed and smaller file size, I’ll be working on a subset of the EBD containing all Northern Bobwhite records from 2019, which I obtained using the EBD custom download form, and you can access here. However, everything I’ll show in this post works equally as well (just a lot slower!) on the full EBD. For this example, let’s say we want to extract all records from within a polygon defining Bird Conservation Region 27 (Southeastern Coastal Plains). A GeoPackage of this region is available on the GitHub repository for the eBird Best Practices book, download it, place it in the data/
subdirectory of your RStudio project, then load it into R with:
library(sf)
library(auk)
library(dplyr)
poly <- read_sf("data/gis-data.gpkg", layer = "bcr")
If you have a shapefile, replace "data/gis-data.gpkg"
with the path to your shapefile and omit layer = "bcr"
. Now that we have a polygon, extracting eBird data is a two step process:
auk_bbox()
. This is necessary because due to the way auk works under the hood, it can only filter to ranges of latitudes and longitudes.Fortunately, step 1 is made easier by auk_bbox()
accepting spatial sf
or raster
objects and automatically calculating the bounding box for you. For example,
auk_ebd("data/ebd_norbob_201901_201912_relFeb-2020.txt") %>%
auk_bbox(poly)
Input
EBD: /home/maelle/Documents/ropensci/roweb2/content/technotes/2020-04-16-extracting-ebird-data-from-a-polygon/data/ebd_norbob_201901_201912_relFeb-2020.txt
Output
Filters not executed
Filters
Species: all
Countries: all
States: all
BCRs: all
Bounding box: Lon -91.6 - -75.5; Lat 29.3 - 37.3
Date: all
Start time: all
Last edited date: all
Protocol: all
Project code: all
Duration: all
Distance travelled: all
Records with breeding codes only: no
Complete checklists only: no
Notice that the output of the above command says Bounding box: Lon -91.6 - -75.5; Lat 29.3 - 37.3
, which are the bounds of the smallest square that contains the polygon. Let’s follow the method outlined in the Best Practices book to extract some data! We’ll get all observations on complete checklists from May to August inside the bounding box of the polygon:
f_out <- "data/ebd_norbob_poly.txt"
auk_ebd("data/ebd_norbob_201901_201912_relFeb-2020.txt") %>%
# define filters
auk_bbox(poly) %>%
auk_date(c("*-05-01", "*-08-31")) %>%
auk_complete() %>%
# compile and run filters
auk_filter(f_out)
The results were output to a file, which you can read in with read_ebd()
.
ebd <- read_ebd("data/ebd_norbob_poly.txt")
The data are now in a data frame and it’s time to proceed to step 2: further subset the data to only keep points within the polygon. First we’ll convert this data frame to a spatial sf
object using the latitude
and longitude
columns, then well use st_within()
to identify the points within the polygon, and use this to subset the data frame. Note that we have to be careful with our coordinate reference system here: crs = 4326
specifies that the EBD data are in unprojected, lat-long coordinates and we use st_transform()
to ensure the polygons and points are in the coordinate reference system.
# convert to sf object
ebd_sf <- ebd %>%
select(longitude, latitude) %>%
st_as_sf( coords = c("longitude", "latitude"), crs = 4326)
# put polygons in same crs
poly_ll <- st_transform(poly, crs = st_crs(ebd_sf))
# identify points in polygon
in_poly <- st_within(ebd_sf, poly_ll, sparse = FALSE)
although coordinates are longitude/latitude, st_within assumes that they are planar
# subset data frame
ebd_in_poly <- ebd[in_poly[, 1], ]
Finally, let’s create a simple map showing the EBD observations before (in black) and after (in green) subsetting the data to be within the polygon.
par(mar = c(0, 0, 0, 0))
plot(poly %>% st_geometry(), col = "grey40", border = NA)
plot(ebd_sf, col = "black", pch = 19, cex = 0.5, add = TRUE)
plot(ebd_sf[in_poly[, 1], ],
col = "forestgreen", pch = 19, cex = 0.5,
add = TRUE)
legend("top",
legend = c("All observations", "After spatial subsetting"),
col = c("grey40", "forestgreen"),
pch = 19,
bty = "n",
ncol = 2)
Looks like it worked! We got just the points within the polygon as intended. Two final notes:
read_ebd()
and skip straight to step 2.auk_ebd()
, subset both the EBD and sampling event data files separately to points within the polygon, the combine them together and zero-fill with auk_zerofill()
.auk users are welcome to submit issues on GitHub with bug reports or feature requests, or to email me directly. Should you be interested in contributing to auk, there is also a vignette specifically for adding functionality to the package.