March 19, 2020 From rOpenSci (https://ropensci.org/blog/2020/03/19/parzer/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
parzer is a new package for handling messy geographic coordinates. The first version is now on CRAN, with binaries coming soon hopefully (see note about installation below). The package recently completed rOpenSci review.
The idea for this package started with a tweet from Noam Ross (https://twitter.com/noamross/status/1070733367522590721) about 15 months ago.
The idea being that sometimes you have geographic coordinates in a messy format, or in many different formats, etc. You can think of it as being the package for geographic coordinates that lubridate is for dates.
The package is on CRAN so you can use
However, since this package requires compilation you probably want a binary. Binaries are not available on CRAN yet. You can install a binary like
install.packages("parzer", repos = "https://dev.ropensci.org/")
Check out the package documentation to get started: https://docs.ropensci.org/parzer/
The following is a summary of the functions in the package and what they do:
Parse latitude or longitude separately
Parse latitudes and longitudes at the same time
Parse into separate parts of degrees, minutes, seconds
Pull out separately degrees, minutes, seconds, or hemisphere
Add/subtract degrees, minutes, seconds
parse latitudes and longitudes
lats <- c("40.123°", "40.123N74.123W", "191.89", 12, "N45 04.25764") parse_lat(lats) #> Warning in pz_parse_lat(lat): invalid characters, got: 40.123n74.123w #> Warning in pz_parse_lat(lat): not within -90/90 range, got: 191.89 #> check that you did not invert lon and lat #>  40.12300 NaN NaN 12.00000 45.07096
longs <- c("45W54.2356", "181", 45, 45.234234, "-45.98739874N") parse_lon(longs) #> Warning in pz_parse_lon(lon): invalid characters, got: -45.98739874n #>  -45.90393 181.00000 45.00000 45.23423 NaN
Sometimes you may want to parse a geographic coordinate into its component
parse_parts_lon are what you need:
x <- c("191.89", 12, "N45 04.25764") parse_parts_lon(x) #> Warning in pz_parse_parts_lon(scrub(str)): invalid characters, got: n45 04.25764 #> deg min sec #> 1 191 53 23.99783 #> 2 12 0 0.00000 #> 3 NA NA NaN
pz_d(31) #> 31 pz_d(31) + pz_m(44) #> 31.73333 pz_d(31) - pz_m(44) #> 30.26667 pz_d(31) + pz_m(44) + pz_s(59) #> 31.74972 pz_d(-121) + pz_m(1) + pz_s(33) #> -120.9742
There’s more to do. We are thinking about dropping the Rcpp dependency, support parsing strings that have both latitude and longitude together, making error messages better, and more.