6 R packages to download open spatial data

Spatial data are used in a wide range of disciplines including environment, health, agriculture, economy, and society (Moraga and Baker 2022). Several R packages have been recently developed as clients for various databases that can be used for easy access of spatial data including administrative boundaries, climatic, and OpenStreetMap data. Here, we give short reproducible examples on how to download and visualize spatial data that can be useful in different settings. More extended examples and details about the capabilities of each of the packages can be seen at the packages’ websites, and the rspatialdata website which provides a collection of tutorials on R packages to download and visualize spatial data using R.

6.1 Administrative boundaries of countries

We can download administrative boundaries of world countries with the rnaturalearth package (South 2017). Other packages can also be used to obtain data of specific countries such as the USA with tidycensus (Walker and Herman 2023) and tigris (Walker 2023), Spain with mapSpain (Hernangomez 2022), and Brazil with geobr (Pereira and Goncalves 2022). The giscoR package (Hernangomez 2023a) helps to retrieve data from Eurostat - GISCO (the Geographic Information System of the COmmission) which contains several open data such as countries and coastal lines.

Here, we use rnaturalearth to download the administrative boundaries from Natural Earth map data. Note that when installing rnaturalearth, we may get an error that can be fixed by installing the rnaturalearthhires package (South 2023). The ne_countries() function allows us to download the map of the country specified in argument country, of scale given in scale, and of class sp or sf given in returnclass. We can retrieve the possible names that can be specified in argument country by typing ne_countries()$admin. The ne_states() function can be used to obtain administrative divisions for specific countries.

Figure 6.1 shows maps of Germany and its divisions downloaded with rnaturalearth, and plotted side-by-side with the patchwork package (Pedersen 2022).

# install.packages("devtools")
# devtools::install_github("ropensci/rnaturalearthhires")

library(rnaturalearth)
library(sf)
library(ggplot2)
library(viridis)
library(patchwork)

map1 <- ne_countries(type = "countries", country = "Germany",
                     scale = "medium", returnclass = "sf")
map2 <- rnaturalearth::ne_states("Germany", returnclass = "sf")
p1 <- ggplot(map1) + geom_sf()
p2 <- ggplot(map2) + geom_sf()
p1 + p2
Maps of Germany and its divisions obtained with **rnaturalearth**.

FIGURE 6.1: Maps of Germany and its divisions obtained with rnaturalearth.

Note that in case of needing maps of administrative boundaries at greater levels than the ones provided by these packages, we could resort to other sources such as online data repositories maintained by specific countries.

6.2 Climatic data

The geodata package (Hijmans et al. 2023) allows us to download geographic data including climate, elevation, land use, soil, crop, species occurrence, administrative boundaries, and other data. The geodata package is a successor of the getData() function from the raster package (Hijmans 2023).

For example, the worldclim_country() function downloads climate data from WorldClim including minimum temperature (tmin), maximum temperature (tmax), average temperature (tavg), precipitation (prec), and wind speed (wind). The country_codes() function of geodata can be used to get the names and codes of the world countries. Here, we provide an example on how to download minimum temperature in Jamaica using worldclim_country() specifying country = "Jamaica", var = "tmin" and path = tempdir() as the path name of the temporary directory to download the data. This function retrieves the temperature for each month, and we can plot the mean over the months with mean(d) (Figure 6.2).

library(geodata)
d <- worldclim_country(country = "Jamaica", var = "tmin",
                       path = tempdir())
terra::plot(mean(d), plg = list(title = "Min. temperature (C)"))
Minimum temperature in Jamaica obtained with **geodata**.

FIGURE 6.2: Minimum temperature in Jamaica obtained with geodata.

The geodata package can also be used to download other data such as elevation with elevation_30s(), land cover with landcover(), and soil with soil_world().

6.3 Precipitation

The chirps package (de Sousa, Sparks, and Ghosh 2022) allows us to obtain daily high-resolution precipitation, as well as daily maximum and minimum temperatures from the Climate Hazards Group. We use the get_chirps() function to obtain daily precipitation in Bangkok, Thailand, by specifying the longitude and latitude coordinates of Bangkok, the dates, and the server source. Here, we use the "ClimateSERV" server instead of the default server "CHC", since it is recommended when few data points are required (Figure ??).

library("chirps")
location <- data.frame(long = 100.523186, lat = 13.736717)
d <- get_chirps(location, dates = c("2020-01-01", "2022-12-31"),
                server = "ClimateSERV")
ggplot(d, aes(x = date, y = chirps)) + geom_line() +
  labs(y = "Precipitation (mm)")
Daily precipitation in Bangkok obtained with chirps.

FIGURE 6.3: Daily precipitation in Bangkok obtained with chirps.

6.4 Elevation

The elevatr package (Hollister 2022) allows us to get elevation data from Amazon Web Services (AWS) Terrain Tiles and OpenTopography Global Digital Elevation Models API. The get_elev_raster() function can be used to download elevation at the locations specified in argument locations and with a zoom specified in argument z. Argument clip can be set to "tile" to return full tiles, "bbox" to return data clipped to the bounding box of the locations, or "locations" to return data clipped to the data specified in locations. Figure 6.4 shows the elevation of Switzerland downloaded passing to get_elev_raster() a sf object with the map of the country.

library(rnaturalearth)
library(elevatr)
library(terra)
map <- ne_countries(type = "countries", country = "Switzerland",
                    scale = "medium", returnclass = "sf")
d <- get_elev_raster(locations = map, z = 9, clip = "locations")
terra::plot(rast(d), plg = list(title = "Elevation (m)"))
Elevation in Switzerland obtained with elevatr.

FIGURE 6.4: Elevation in Switzerland obtained with elevatr.

6.5 OpenStreetMap data

OpenStreetMap (OSM) is an open world geographic database updated and maintained by a community of volunteers. We can use the osmdata package (Padgham et al. 2023) to retrieve OSM data including roads, shops, railway stations, and much more. The available_features() function can be used to get the list of recognized features in OSM. This list can be found in the OSM wiki.

[1] "4wd_only"  "abandoned" "abutters"  "access"   
[5] "addr"      "addr:city"

The available_tags() function lists out the tags associated with each feature. For example, tags associated with feature "amenity" can be obtained as follows:

head(available_tags("amenity"))
# A tibble: 6 × 2
  Key     Value          
  <chr>   <chr>          
1 amenity animal_boarding
2 amenity animal_breeding
3 amenity animal_shelter 
4 amenity animal_training
5 amenity arts_centre    
6 amenity atm            

The first step in creating an osmdata query is defining the geographical area we wish to include in the query. This can be done by defining a bounding box that defines a geographical area by its bounding latitudes and longitudes. The bounding box for a given place name can be obtained with the getbb() function. For example, the bounding box of Barcelona, Spain, can be obtained as follows.

placebb <- getbb("Barcelona")
placebb
     min    max
x  2.052  2.228
y 41.317 41.468

To retrieve the required features of a place defined by the bounding box, we need to create an overpass query with opq(). Then, the add_osm_feature() function can be used to add the required features to the query. Finally, we use the osmdata_sf() function to obtain a simple feature object of the resultant query. For example, we can obtain the hospitals of Barcelona by specifying its bounding box placebb and using add_osm_feature() with key = "amenity" and value = "hospital" as follows.

hospitals <- placebb %>% opq() %>%
  add_osm_feature(key = "amenity", value = "hospital") %>%
  osmdata_sf()

Motorways can be retrieved using key = "highway" and value = "motorway".

motorways <- placebb %>% opq() %>%
  add_osm_feature(key = "highway", value = "motorway") %>%
  osmdata_sf()

Figure 6.5 shows an interactive map with the hospitals and motorways of Barcelona downloaded with osmdata.

library(leaflet)
leaflet() %>%  addTiles() %>%
  addPolylines(data = motorways$osm_lines, color = "black") %>%
  addPolygons(data = hospitals$osm_polygons,
              label = hospitals$osm_polygons$name)

FIGURE 6.5: Map with the hospitals and motorways of Barcelona obtained with osmdata.

6.6 World Bank data

The World Bank provides a great source of global socio-economic data spanning several decades and dozens of topics, with the potential to shed light on numerous global issues. Some of the indicators can be seen at this website. The wbstats package (Piburn 2020) allows us to search and download data from the World Bank API. The wb_search() function can be used to find indicators that match a search term. For example, we can find indicators that contain the words "poverty" or "unemployment" as follows.

library(wbstats)
indicators <- wb_search(pattern = "poverty|unemployment")
# print(indicators)

We can inspect the indicators retrieved with View(indicators). The function wb_data() allows us to retrieve the chosen data. For example, here we download Human Development Index which has ID MO.INDEX.HDEV.XQ in 2011.

d <- wb_data(indicator = "MO.INDEX.HDEV.XQ",
             start_date = 2011, end_date = 2011)
print(head(d))
# A tibble: 6 × 9
  iso2c iso3c country       date MO.INDEX.HDEV.XQ unit 
  <chr> <chr> <chr>        <dbl>            <dbl> <chr>
1 AO    AGO   Angola        2011             47.7 <NA> 
2 BI    BDI   Burundi       2011             48.5 <NA> 
3 BJ    BEN   Benin         2011             53.2 <NA> 
4 BF    BFA   Burkina Faso  2011             45.8 <NA> 
5 BW    BWA   Botswana      2011             80.3 <NA> 
6 CF    CAF   Central Afr…  2011             32.9 <NA> 
# ℹ 3 more variables: obs_status <chr>,
#   footnote <chr>, last_updated <date>

We can visualize the data by adding the data d to a map of Africa retrieved with the ne_countries() function of rnaturalearth (Figure ??).

library(rnaturalearth)
library(mapview)
map <- ne_countries(continent = "Africa", returnclass = "sf")
map <- dplyr::left_join(map, d, by = c("iso_a3" = "iso3c"))
mapview(map, zcol = "MO.INDEX.HDEV.XQ")

6.7 Species occurrence

The spocc package (Chamberlain 2021) is an interface to many species occurrence data sources including Global Biodiversity Information Facility (GBIF), USGSs’ Biodiversity Information Serving Our Nation (BISON), iNaturalist, eBird, Integrated Digitized Biocollections (iDigBio), VertNet, Ocean Biogeographic Information System (OBIS), and Atlas of Living Australia (ALA). The package provides functionality to retrieve and combine species occurrence data.

The occ() function from spocc can be used to retrieve the locations of species. Here, we download data on brown-throated sloths in Costa Rica recorded between 2000 and 2019 from the GBIF database. Arguments of this function include query with the species scientific name (Bradypus variegatus), from with the name of the database (GBIF), and date with the start and end dates (2000-01-01 to 2019-12-31). We also specify that we wish to retrieve occurrences in Costa Rica by setting gbifopts to a named list with country equal to the 2-letter code of Costa Rica (CR). Moreover, we only retrieve occurrence data that have coordinates by setting has_coords = TRUE, and specify limit equal to 1000 to retrieve a maximum of 1000 occurrences.

library('spocc')
df <- occ(query = "Bradypus variegatus", from = "gbif",
          date = c("2000-01-01", "2019-12-31"),
          gbifopts = list(country = "CR"),
          has_coords = TRUE, limit = 1000)
d <- occ2df(df)

Then, we transform the point data to a sf object with st_as_sf(), assign the coordinate reference system given by the EPSG code 4326 to represent longitude and latitude coordinates. Figure 6.6 shows the retrieved locations of sloths in Costa Rica.

library(sf)
d <- st_as_sf(d, coords = c("longitude", "latitude"))
st_crs(d) <- 4326
mapview(d)

FIGURE 6.6: Locations of sloths in Costa Rica obtained with spocc.

6.8 Population, health, and other spatial data

Other packages that can be used to obtain spatial data for the world or specific countries include the following: The wopr package (Leasure et al. 2023) provides access to the WorldPop Open Population Repository and provides estimates of population sizes for specific geographic areas. These data are collected by the WorldPop Hub (https://hub.worldpop.org/), which provides open high-resolution geospatial data on population count and density, demographic and dynamics, with a focus on low- and middle-income countries.

The rdhs package (Watson and Eaton 2022) gives the users the ability to access and make analysis on the Demographic and Health Survey (DHS) data. The malariaAtlas package (Pfeffer et al. 2020) can be used to download, visualize and manipulate global malaria data hosted by the Malaria Atlas Project. The openair package (Carslaw, Davison, and Ropkins 2023) allows us to obtain air quality data and other atmospheric composition data.

Many other spatial datasets are included in several packages mainly to demonstrate the packages’ functionality. For example, spatstat (Baddeley, Turner, and Rubak 2022) contains point pattern data that can be listed with data(package="spatstat.data"). The spData package (Bivand, Nowosad, and Lovelace 2022) also includes diverse spatial datasets that can be used for teaching spatial data analysis.

R also has packages that allow us to geocode place names or addresses. For example, the packages ggmap (Kahle, Wickham, and Jackson 2022) and opencage (Possenriede, Sadler, and Salmon 2021) can be used to convert names to geographic coordinates.