3 The sf package for spatial vector data

3.1 The sf package

The sf package (Pebesma 2022a) can be used to represent and work with spatial vector data including points, polygons, and lines, and their associated information. The sf package uses sf objects that are extensions of data frames containing a collection of simple features or spatial objects with possibly associated data.

We can read a sf object with the st_read() function of sf. For example, here we read the nc shapefile of sf which contains the counties of North Carolina, USA, as well as their name, number of births, and number of sudden infant deaths in 1974 and 1979.

library(sf)
pathshp <- system.file("shape/nc.shp", package = "sf")
nc <- st_read(pathshp, quiet = TRUE)
class(nc)
[1] "sf"         "data.frame"

The sf object nc is a data.frame containing a collection with 100 simple features (rows) and 6 attributes (columns) plus a list-column with the geometry of each feature. A sf object contains the following objects of class sf, sfc and sfg:

  • sf (simple feature): each row of the data.frame is a single simple feature consisting of attributes and geometry.
  • sfc (simple feature geometry list-column): the geometry column of the data.frame is a list-column of class sfc with the geometry of each simple feature.
  • sfg (simple feature geometry): each of the rows of the sfc list-column corresponds to the simple feature geometry (sfg) of a single simple feature.

We can see and plot the information of the sf object as follows (Figure 3.1):

print(nc)
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32 ymin: 33.88 xmax: -75.46 ymax: 36.59
Geodetic CRS:  NAD27
First 10 features:
    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS
1  0.114     1.442  1825    1825        Ashe 37009
2  0.061     1.231  1827    1827   Alleghany 37005
3  0.143     1.630  1828    1828       Surry 37171
4  0.070     2.968  1831    1831   Currituck 37053
5  0.153     2.206  1832    1832 Northampton 37131
6  0.097     1.670  1833    1833    Hertford 37091
7  0.062     1.547  1834    1834      Camden 37029
8  0.091     1.284  1835    1835       Gates 37073
9  0.118     1.421  1836    1836      Warren 37185
10 0.124     1.428  1837    1837      Stokes 37169
   FIPSNO CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79
1   37009        5  1091     1      10  1364     0
2   37005        3   487     0      10   542     3
3   37171       86  3188     5     208  3616     6
4   37053       27   508     1     123   830     2
5   37131       66  1421     9    1066  1606     3
6   37091       46  1452     7     954  1838     5
7   37029       15   286     0     115   350     2
8   37073       37   420     0     254   594     2
9   37185       93   968     4     748  1190     2
10  37169       85  1612     1     160  2038     5
   NWBIR79                       geometry
1       19 MULTIPOLYGON (((-81.47 36.2...
2       12 MULTIPOLYGON (((-81.24 36.3...
3      260 MULTIPOLYGON (((-80.46 36.2...
4      145 MULTIPOLYGON (((-76.01 36.3...
5     1197 MULTIPOLYGON (((-77.22 36.2...
6     1237 MULTIPOLYGON (((-76.75 36.2...
7      139 MULTIPOLYGON (((-76.01 36.3...
8      371 MULTIPOLYGON (((-76.56 36.3...
9      844 MULTIPOLYGON (((-78.31 36.2...
10     176 MULTIPOLYGON (((-80.03 36.2...
plot(nc)
`sf` object representing the counties of North Carolina, USA, and associated information.

FIGURE 3.1: sf object representing the counties of North Carolina, USA, and associated information.

We can subset feature sets by using the square bracket notation and use the drop argument to drop geometries.

nc[1, ] # first row
nc[nc$NAME == "Ashe", ] # row with NAME "Ashe"
nc[1, "NWBIR74"] # first row, column with name NWBIR74
nc[1, "NWBIR74", drop = TRUE] # drop geometry

The st_geometry() function can be used to retrieve the simple feature geometry list-column (sfc).

# Geometries printed in abbreviated form
st_geometry(nc)
# View complete geometry by selecting one
st_geometry(nc)[[1]]

3.2 Creating a sf object

We can use the st_sf() function to create a sf object by providing two elements, namely, a data.frame with the attributes of each feature, and a simple feature geometry list-column sfc containing simple feature geometries sfg. In more detail, we create simple feature geometries sfg and use the st_sfc() function to create a simple feature geometry list-column sfc with them. Then, we use st_sf() to put the data.frame with the attributes and the simple feature geometry list-column sfc together.

Simple feature geometries sfg objects can be, for example, of type POINT (single point), MULTIPOINT (set of points) or POLYGON (polygon), and can be created with st_point(), st_multipoint() and st_polygon(), respectively. Here, we create a sf object containing two single points, a set of points, and a polygon, with one attribute. First, we create the simple feature geometry objects (sfg) of type POINT, MULTIPOINT, and POLYGON. Then, we use st_sfc() to create a simple feature geometry list-column sfc with the sfg objects. Finally, we use st_sf() to put the data.frame with the attribute and the simple feature geometry list-column sfc together. Figure 3.2 shows the resulting sf object plotted with ggplot2.

# Single point (point as a vector)
p1_sfg <- st_point(c(2, 2))
p2_sfg <- st_point(c(2.5, 3))

# Set of points (points as a matrix)
p <- rbind(c(6, 2), c(6.1, 2.6), c(6.8, 2.5),
           c(6.2, 1.5), c(6.8, 1.8))
mp_sfg <- st_multipoint(p)

# Polygon. Sequence of points that form a closed,
# non-self intersecting ring.
# The first ring denotes the exterior ring,
# zero or more subsequent rings denote holes in the exterior ring
p1 <- rbind(c(10, 0), c(11, 0), c(13, 2),
            c(12, 4), c(11, 4), c(10, 0))
p2 <- rbind(c(11, 1), c(11, 2), c(12, 2), c(11, 1))
pol_sfg <- st_polygon(list(p1, p2))

# Create sf object
p_sfc <- st_sfc(p1_sfg, p2_sfg, mp_sfg, pol_sfg)
df <- data.frame(v1 = c("A", "B", "C", "D"))
p_sf <- st_sf(df, geometry = p_sfc)

# Plot single points, set of points and polygon
library(ggplot2)
ggplot(p_sf) + geom_sf(aes(col = v1), size = 3) + theme_bw()
`sf` object representing two single points, a set of points, and a polygon, with one attribute.

FIGURE 3.2: sf object representing two single points, a set of points, and a polygon, with one attribute.

3.3 st_*() functions

Common functions to manipulate sf objects include the following:

For example, we can read a sf object as follows:

library(sf)
library(ggplot2)
map <- read_sf(system.file("shape/nc.shp", package = "sf"))

We can inspect the first rows of the sf object map with head().

head(map)
Simple feature collection with 6 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -81.74 ymin: 36.07 xmax: -75.77 ymax: 36.59
Geodetic CRS:  NAD27
# A tibble: 6 × 15
   AREA PERIMETER CNTY_ CNTY_ID NAME       FIPS  FIPSNO
  <dbl>     <dbl> <dbl>   <dbl> <chr>      <chr>  <dbl>
1 0.114      1.44  1825    1825 Ashe       37009  37009
2 0.061      1.23  1827    1827 Alleghany  37005  37005
3 0.143      1.63  1828    1828 Surry      37171  37171
4 0.07       2.97  1831    1831 Currituck  37053  37053
5 0.153      2.21  1832    1832 Northampt… 37131  37131
6 0.097      1.67  1833    1833 Hertford   37091  37091
# ℹ 8 more variables: CRESS_ID <int>, BIR74 <dbl>,
#   SID74 <dbl>, NWBIR74 <dbl>, BIR79 <dbl>,
#   SID79 <dbl>, NWBIR79 <dbl>,
#   geometry <MULTIPOLYGON [°]>

We can delete some of the polygons by taking a subset of the rows of map. We can use st_union() with argument by_feature = FALSE to combine all geometries together. The boundaries of a map can be simplified with the st_simplify() function (Figure 3.3).

# Delete polygon
map <- map[-which(map$FIPS %in% c("37125", "37051")), ]
ggplot(map) + geom_sf(aes(fill = SID79))

# Combine geometries
ggplot(st_union(map, by_feature = FALSE) %>% st_sf()) + geom_sf()

# Simplify
ggplot(st_simplify(map, dTolerance = 10000)) + geom_sf()
`sf` object obtained by deleting some of its polygons (top), combining polygons (middle), and simplifying polygons (bottom).

FIGURE 3.3: sf object obtained by deleting some of its polygons (top), combining polygons (middle), and simplifying polygons (bottom).

3.4 Transforming point data to a sf object

The st_as_sf() function allows us to convert a foreign object to a sf object. For example, we can have a data frame containing the coordinates of a number of locations and attributes that we wish to turn into a sf object. To do that, we use the st_as_sf() function passing the object we wish to convert and specifying in argument coords the name of the columns that contain the point coordinates. For example, here we use st_as_sf() to turn a data frame containing coordinates long and lat and two variables place and value to a sf object. Then, we use st_crs() to set the coordinate reference system given by the EPSG code 4326 to represent longitude and latitude coordinates. Figure 3.4 shows the plot of the sf object obtained with mapview.

library(sf)
library(mapview)

d <- data.frame(
place = c("London", "Paris", "Madrid", "Rome"),
long = c(-0.118092, 2.349014, -3.703339, 12.496366),
lat = c(51.509865, 48.864716, 40.416729, 41.902782),
value = c(200, 300, 400, 500))
class(d)
[1] "data.frame"
dsf <- st_as_sf(d, coords = c("long", "lat"))
st_crs(dsf) <- 4326
class(dsf)
[1] "sf"         "data.frame"
mapview(dsf)

FIGURE 3.4: sf object with points.

3.5 Counting the number of points within polygons

We can use the st_intersects() function of sf to count the number of points within the polygons of a sf object. The returned object is a list with feature ids intersected in each of the polygons. We can use the lengths() function to calculate the number of points inside each feature as seen in Figure 3.5.

library(sf)
library(ggplot2)

# Map with divisions (sf object)
map <- read_sf(system.file("shape/nc.shp", package = "sf"))

# Points over map (simple feature geometry list-column sfc)
points <- st_sample(map, size = 100)

# Map of points within polygons
ggplot() + geom_sf(data = map) + geom_sf(data = points)

# Intersection (first argument map, then points)
inter <- st_intersects(map, points)

# Add point count to each polygon
map$count <- lengths(inter)

# Map of number of points within polygons
ggplot(map) + geom_sf(aes(fill = count))
Top: Points within polygons. Bottom: Number of points within polygons.

FIGURE 3.5: Top: Points within polygons. Bottom: Number of points within polygons.

3.6 Identifying polygons containing points

Given a sf object with points and a sf object with polygons, we can also use the st_intersects() function to obtain the polygon each of the points belongs to. For example, Figure 3.6 shows a map with the names of the polygons that contain three points over the map.

library(sf)
library(ggplot2)

# Map with divisions (sf object)
map <- read_sf(system.file("shape/nc.shp", package = "sf"))

# Points over map (sf object)
points <- st_sample(map, size = 3) %>% st_as_sf()

# Intersection (first argument points, then map)
inter <- st_intersects(points, map)

# Adding column areaname with the name of
# the areas containing the points
points$areaname <- map[unlist(inter), "NAME",
                       drop = TRUE] # drop geometry
points
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.18 ymin: 34.72 xmax: -77.91 ymax: 35.57
Geodetic CRS:  NAD27
                     x areaname
1 POINT (-80.18 35.57) Davidson
2 POINT (-79.26 34.72)  Robeson
3 POINT (-77.91 35.19)    Wayne
# Map
ggplot(map) + geom_sf() + geom_sf(data = points) + 
 geom_sf_label(data = map[unlist(inter), ],
               aes(label = NAME), nudge_y = 0.2)
Map showing the name of polygons that contain three points over the map.

FIGURE 3.6: Map showing the name of polygons that contain three points over the map.

3.7 Joining map and data

Sometimes, a map and its corresponding data are available separately and we may wish to create a sf object representing the map with the added data that we can manipulate and plot. We can create a sf map with the data attributes by joining the map and the data with the left_join() function of the dplyr (Wickham, Francois, et al. 2022) package. Here, we present an example where we add air pollution data obtained with the wbstats (Piburn 2020) package to a sf map obtained with the rnaturalearth package. First, we use the ne_countries() function of rnaturalearth (South 2017) to download the world map with the country polygons of class sf.

library(rnaturalearth)
map <- ne_countries(returnclass = "sf")

Then, we use the wbstats package to download a data frame of air pollution data from the World Bank. Specifically, we search the pollution indicators with wb_search(), and use wb_data() to download PM2.5 in year 2016 by specifying the indicator corresponding to PM2.5, and the start and end dates.

library(wbstats)
indicators <- wb_search(pattern = "pollution")
d <- wb_data(indicator = "EN.ATM.PM25.MC.M3",
             start_date = 2016, end_date = 2016)

Then, we use the left_join() function of dplyr to join the map and the data specifying the argument by the variables we wish to join by. Here, we use ISO3 standard code of the countries rather than the country names, since names can be written differently in the map and the data frame. Figure 3.7 shows the map of the data obtaind with ggplot2.

library(dplyr)
library(ggplot2)
library(viridis)

map1 <- left_join(map, d, by = c("iso_a3" = "iso3c"))
ggplot(map1) + geom_sf(aes(fill = EN.ATM.PM25.MC.M3)) +
  scale_fill_viridis() + labs(fill = "PM2.5") + theme_bw()
PM2.5 values in each of the world countries in 2016.

FIGURE 3.7: PM2.5 values in each of the world countries in 2016.

Note that when we use left_join(), the class of the resulting object is the same as the class of the first argument. Thus, left_join(sf_object, data.frame_object) creates a sf object, whereas left_join(data.frame_object, sf_object) is a data.frame:

map1 <- left_join(map, d, by = c("iso_a3" = "iso3c"))
class(map1)
[1] "sf"         "data.frame"
d1 <- left_join(d, map, by = c("iso3c" = "iso_a3"))
class(d1)
[1] "tbl_df"     "tbl"        "data.frame"

Note also that given two data frames x and y, we can join them by using the left_join(), right_join(), inner_join() or full_join() functions. Specifically, left_join(x, y) includes all rows in x so the resulting data frame includes all observations in the left data frame x, whether or not there is a match in the right data frame y. right_join(x, y) includes all rows in y so the resulting data frame includes all observations in y, whether or not there is a match in x. inner_join(x, y) includes all rows in x and y so the resulting data frame includes only observations that are in both data frames. Finally, full_join(x, y) includes all rows in x or y so the resulting data frame includes all observations from both data frames.