3 The sf package for spatial vector data
3.1 The sf package
The sf package (Pebesma 2022a) can be used to represent and work with spatial vector data including points, polygons, and lines, and their associated information.
The sf package uses sf
objects that are extensions of data frames containing a collection of simple features or spatial objects with possibly associated data.
We can read a sf
object with the st_read()
function of sf.
For example, here we read the nc
shapefile of sf which contains the counties of North Carolina, USA, as well as their name, number of births, and number of sudden infant deaths in 1974 and 1979.
library(sf)
pathshp <- system.file("shape/nc.shp", package = "sf")
nc <- st_read(pathshp, quiet = TRUE)
class(nc)
[1] "sf" "data.frame"
The sf
object nc
is a data.frame
containing a collection with 100 simple features (rows) and 6 attributes (columns) plus a list-column with the geometry of each feature. A sf
object contains the following objects of class sf
, sfc
and sfg
:
-
sf
(simple feature): each row of thedata.frame
is a single simple feature consisting of attributes and geometry. -
sfc
(simple feature geometry list-column): thegeometry
column of thedata.frame
is a list-column of classsfc
with the geometry of each simple feature. -
sfg
(simple feature geometry): each of the rows of thesfc
list-column corresponds to the simple feature geometry (sfg
) of a single simple feature.
We can see and plot the information of the sf
object as follows (Figure 3.1):
print(nc)
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32 ymin: 33.88 xmax: -75.46 ymax: 36.59
Geodetic CRS: NAD27
First 10 features:
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS
1 0.114 1.442 1825 1825 Ashe 37009
2 0.061 1.231 1827 1827 Alleghany 37005
3 0.143 1.630 1828 1828 Surry 37171
4 0.070 2.968 1831 1831 Currituck 37053
5 0.153 2.206 1832 1832 Northampton 37131
6 0.097 1.670 1833 1833 Hertford 37091
7 0.062 1.547 1834 1834 Camden 37029
8 0.091 1.284 1835 1835 Gates 37073
9 0.118 1.421 1836 1836 Warren 37185
10 0.124 1.428 1837 1837 Stokes 37169
FIPSNO CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79
1 37009 5 1091 1 10 1364 0
2 37005 3 487 0 10 542 3
3 37171 86 3188 5 208 3616 6
4 37053 27 508 1 123 830 2
5 37131 66 1421 9 1066 1606 3
6 37091 46 1452 7 954 1838 5
7 37029 15 286 0 115 350 2
8 37073 37 420 0 254 594 2
9 37185 93 968 4 748 1190 2
10 37169 85 1612 1 160 2038 5
NWBIR79 geometry
1 19 MULTIPOLYGON (((-81.47 36.2...
2 12 MULTIPOLYGON (((-81.24 36.3...
3 260 MULTIPOLYGON (((-80.46 36.2...
4 145 MULTIPOLYGON (((-76.01 36.3...
5 1197 MULTIPOLYGON (((-77.22 36.2...
6 1237 MULTIPOLYGON (((-76.75 36.2...
7 139 MULTIPOLYGON (((-76.01 36.3...
8 371 MULTIPOLYGON (((-76.56 36.3...
9 844 MULTIPOLYGON (((-78.31 36.2...
10 176 MULTIPOLYGON (((-80.03 36.2...
plot(nc)
We can subset feature sets by using the square bracket notation and use the drop
argument to drop geometries.
nc[1, ] # first row
nc[nc$NAME == "Ashe", ] # row with NAME "Ashe"
nc[1, "NWBIR74"] # first row, column with name NWBIR74
nc[1, "NWBIR74", drop = TRUE] # drop geometry
The st_geometry()
function can be used to retrieve
the simple feature geometry list-column (sfc
).
# Geometries printed in abbreviated form
st_geometry(nc)
# View complete geometry by selecting one
st_geometry(nc)[[1]]
3.2 Creating a sf
object
We can use the st_sf()
function to create a sf
object by providing two elements, namely,
a data.frame
with the attributes of each feature, and
a simple feature geometry list-column sfc
containing simple feature geometries sfg
.
In more detail, we create simple feature geometries sfg
and use the st_sfc()
function to create a simple feature geometry list-column sfc
with them. Then, we use st_sf()
to put the data.frame
with the attributes and the simple feature geometry list-column sfc
together.
Simple feature geometries sfg
objects can be, for example, of type POINT
(single point), MULTIPOINT
(set of points) or POLYGON
(polygon), and can be created with st_point()
, st_multipoint()
and st_polygon()
, respectively.
Here, we create a sf
object containing two single points, a set of points, and a polygon, with one attribute.
First, we create the simple feature geometry objects (sfg
) of type POINT
, MULTIPOINT
, and POLYGON
.
Then, we use st_sfc()
to create a simple feature geometry list-column sfc
with the sfg
objects.
Finally, we use st_sf()
to put the data.frame
with the attribute and the simple feature geometry list-column sfc
together.
Figure 3.2 shows the resulting sf
object plotted with ggplot2.
# Single point (point as a vector)
p1_sfg <- st_point(c(2, 2))
p2_sfg <- st_point(c(2.5, 3))
# Set of points (points as a matrix)
p <- rbind(c(6, 2), c(6.1, 2.6), c(6.8, 2.5),
c(6.2, 1.5), c(6.8, 1.8))
mp_sfg <- st_multipoint(p)
# Polygon. Sequence of points that form a closed,
# non-self intersecting ring.
# The first ring denotes the exterior ring,
# zero or more subsequent rings denote holes in the exterior ring
p1 <- rbind(c(10, 0), c(11, 0), c(13, 2),
c(12, 4), c(11, 4), c(10, 0))
p2 <- rbind(c(11, 1), c(11, 2), c(12, 2), c(11, 1))
pol_sfg <- st_polygon(list(p1, p2))
# Create sf object
p_sfc <- st_sfc(p1_sfg, p2_sfg, mp_sfg, pol_sfg)
df <- data.frame(v1 = c("A", "B", "C", "D"))
p_sf <- st_sf(df, geometry = p_sfc)
# Plot single points, set of points and polygon
library(ggplot2)
ggplot(p_sf) + geom_sf(aes(col = v1), size = 3) + theme_bw()
3.3 st_*()
functions
Common functions to manipulate sf
objects include the following:
-
st_read()
reads asf
object, -
st_write()
writes asf
object, -
st_crs()
gets or sets a new coordinate reference system (CRS), -
st_transform()
transforms data to a new CRS, -
st_intersection()
intersectssf
objects, -
st_union()
combines severalsf
objects into one, -
st_simplify()
simplifies asf
object, -
st_coordinates()
retrieves coordinates of asf
object, -
st_as_sf()
converts a foreign object to asf
object.
For example, we can read a sf
object as follows:
We can inspect the first rows of the sf
object map
with head()
.
head(map)
Simple feature collection with 6 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -81.74 ymin: 36.07 xmax: -75.77 ymax: 36.59
Geodetic CRS: NAD27
# A tibble: 6 × 15
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 0.114 1.44 1825 1825 Ashe 37009 37009
2 0.061 1.23 1827 1827 Alleghany 37005 37005
3 0.143 1.63 1828 1828 Surry 37171 37171
4 0.07 2.97 1831 1831 Currituck 37053 37053
5 0.153 2.21 1832 1832 Northampt… 37131 37131
6 0.097 1.67 1833 1833 Hertford 37091 37091
# ℹ 8 more variables: CRESS_ID <int>, BIR74 <dbl>,
# SID74 <dbl>, NWBIR74 <dbl>, BIR79 <dbl>,
# SID79 <dbl>, NWBIR79 <dbl>,
# geometry <MULTIPOLYGON [°]>
We can delete some of the polygons by taking a subset of the rows of map
.
We can use st_union()
with argument by_feature = FALSE
to combine all geometries together.
The boundaries of a map can be simplified with the st_simplify()
function (Figure 3.3).
# Delete polygon
map <- map[-which(map$FIPS %in% c("37125", "37051")), ]
ggplot(map) + geom_sf(aes(fill = SID79))
# Combine geometries
ggplot(st_union(map, by_feature = FALSE) %>% st_sf()) + geom_sf()
# Simplify
ggplot(st_simplify(map, dTolerance = 10000)) + geom_sf()
3.4 Transforming point data to a sf
object
The st_as_sf()
function allows us to convert a foreign object to a sf
object.
For example, we can have a data frame containing the coordinates of a number of locations and attributes that we wish to turn into a sf
object.
To do that, we use the st_as_sf()
function passing the object we wish to convert and specifying in argument coords
the name of the columns that contain the point coordinates. For example, here we use st_as_sf()
to turn a data frame containing coordinates long
and lat
and two variables place
and value
to a sf
object.
Then, we use st_crs()
to set the coordinate reference system given by the EPSG code 4326 to represent longitude and latitude coordinates.
Figure 3.4 shows the plot of the sf
object obtained with mapview.
library(sf)
library(mapview)
d <- data.frame(
place = c("London", "Paris", "Madrid", "Rome"),
long = c(-0.118092, 2.349014, -3.703339, 12.496366),
lat = c(51.509865, 48.864716, 40.416729, 41.902782),
value = c(200, 300, 400, 500))
class(d)
[1] "data.frame"
[1] "sf" "data.frame"
mapview(dsf)
3.5 Counting the number of points within polygons
We can use the st_intersects()
function of sf to count the number of points within the polygons of a sf
object.
The returned object is a list with feature ids intersected in each of the polygons.
We can use the lengths()
function to calculate the number of points inside each feature as seen in Figure 3.5.
library(sf)
library(ggplot2)
# Map with divisions (sf object)
map <- read_sf(system.file("shape/nc.shp", package = "sf"))
# Points over map (simple feature geometry list-column sfc)
points <- st_sample(map, size = 100)
# Map of points within polygons
ggplot() + geom_sf(data = map) + geom_sf(data = points)
# Intersection (first argument map, then points)
inter <- st_intersects(map, points)
# Add point count to each polygon
map$count <- lengths(inter)
# Map of number of points within polygons
ggplot(map) + geom_sf(aes(fill = count))
3.6 Identifying polygons containing points
Given a sf
object with points and a sf
object with polygons,
we can also use the st_intersects()
function to obtain the polygon each of the points belongs to. For example, Figure 3.6 shows a map with the names of the polygons that contain three points over the map.
library(sf)
library(ggplot2)
# Map with divisions (sf object)
map <- read_sf(system.file("shape/nc.shp", package = "sf"))
# Points over map (sf object)
points <- st_sample(map, size = 3) %>% st_as_sf()
# Intersection (first argument points, then map)
inter <- st_intersects(points, map)
# Adding column areaname with the name of
# the areas containing the points
points$areaname <- map[unlist(inter), "NAME",
drop = TRUE] # drop geometry
points
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -80.18 ymin: 34.72 xmax: -77.91 ymax: 35.57
Geodetic CRS: NAD27
x areaname
1 POINT (-80.18 35.57) Davidson
2 POINT (-79.26 34.72) Robeson
3 POINT (-77.91 35.19) Wayne
# Map
ggplot(map) + geom_sf() + geom_sf(data = points) +
geom_sf_label(data = map[unlist(inter), ],
aes(label = NAME), nudge_y = 0.2)
3.7 Joining map and data
Sometimes, a map and its corresponding data are available separately and we may wish to create a sf
object representing the map with the added data that we can manipulate and plot.
We can create a sf
map with the data attributes by joining the map and the data with the left_join()
function of the dplyr (Wickham, Francois, et al. 2022) package.
Here, we present an example where we add air pollution data obtained with the wbstats (Piburn 2020) package to a sf
map obtained with the rnaturalearth package.
First, we use the ne_countries()
function of rnaturalearth (South 2017) to download the world map with the country polygons of class sf
.
library(rnaturalearth)
map <- ne_countries(returnclass = "sf")
Then, we use the wbstats package to download a data frame of air pollution data from the World Bank. Specifically, we search the pollution indicators with wb_search()
, and use wb_data()
to download PM2.5 in year 2016 by specifying the indicator corresponding to PM2.5, and the start and end dates.
library(wbstats)
indicators <- wb_search(pattern = "pollution")
d <- wb_data(indicator = "EN.ATM.PM25.MC.M3",
start_date = 2016, end_date = 2016)
Then, we use the left_join()
function of dplyr to join the map and the data specifying the argument by
the variables we wish to join by. Here, we use ISO3 standard code of the countries rather than the country names, since names can be written differently in the map and the data frame. Figure 3.7 shows the map of the data obtaind with ggplot2.
library(dplyr)
library(ggplot2)
library(viridis)
map1 <- left_join(map, d, by = c("iso_a3" = "iso3c"))
ggplot(map1) + geom_sf(aes(fill = EN.ATM.PM25.MC.M3)) +
scale_fill_viridis() + labs(fill = "PM2.5") + theme_bw()
Note that when we use left_join()
, the class of the resulting object is the same as the class of the first argument. Thus, left_join(sf_object, data.frame_object)
creates a sf
object, whereas
left_join(data.frame_object, sf_object)
is a data.frame
:
[1] "sf" "data.frame"
[1] "tbl_df" "tbl" "data.frame"
Note also that given two data frames x
and y
, we can join them by using the left_join()
, right_join()
, inner_join()
or full_join()
functions.
Specifically, left_join(x, y)
includes all rows in x
so the resulting data frame includes all observations in the left data frame x
, whether or not there is a match in the right data frame y
.
right_join(x, y)
includes all rows in y
so the resulting data frame includes all observations in y
, whether or not there is a match in x
.
inner_join(x, y)
includes all rows in x
and y
so the resulting data frame includes only observations that are in both data frames.
Finally, full_join(x, y)
includes all rows in x
or y
so the resulting data frame includes all observations from both data frames.