# 9 Bayesian spatial models

Bayesian hierarchical models (Banerjee, Carlin, and Gelfand 2004) can be used to analyze areal data that arise when an outcome variable is aggregated into areas that form a partition of the study region. Models can be specified to describe the variability in the response variable as a function of a number of covariates known to affect the outcome, as well as random effects to model residual variation not explained by the covariates. This provides a flexible and robust approach that allows us to assess the effects of explanatory variables, accommodate spatial autocorrelation, and quantify the uncertainty in the estimates obtained.

A commonly used spatial model is the Besag-York-Mollié (BYM) model (Besag, York, and Mollié 1991). Let us assume that \(Y_i\) are observed outcomes at regions \(i=1,\ldots,n\) that can be modeled using a Normal distribution. The BYM model is specified as

\[Y_i \sim Normal(\mu_i, \sigma^2), i = 1,\ldots,n,\]

\[\mu_i = \boldsymbol{z_i \beta} + u_i + v_i.\]

Here, the fixed effects \(\boldsymbol{z_i \beta}\) are expressed using a vector of intercept and \(p\) covariates corresponding to area \(i\), \(\boldsymbol{z_i} = (1, z_{i1}, \ldots, z_{ip})\), and a coefficient vector \(\boldsymbol{\beta} = (\beta_0, \ldots, \beta_p)'\).

The model includes a spatial random effect \(u_i\) that accounts for the spatial dependence between outcomes indicating that areas that are close to each other may have similar values, and an unstructured exchangeable component \(v_i\) to model uncorrelated noise. The spatial random effect \(u_i\) can be modeled with an intrinsic conditional autoregressive model (CAR) that smooths the data according to a certain neighborhood structure. Specifically,

\[u_i| \boldsymbol{u_{-i}} \sim N\left(\bar u_{\delta_i}, \frac{\sigma_u^2}{n_{\delta_i}}\right),\] where \(\bar u_{\delta_i}= n_{\delta_i}^{-1} \sum_{j \in \delta_i} u_j\), with \(\delta_i\) and \(n_{\delta_i}\) representing, respectively, the set of neighbors and the number of neighbors of area \(i\). The unstructured component \(v_i\) is modeled as independent and identically distributed normal variables with zero mean and variance \(\sigma_v^2\), \(v_i \sim N(0, \sigma_v^2).\)

## 9.1 Bayesian inference with INLA

Bayesian hierarchical models can be fitted using a number of approaches such as integrated nested Laplace approximation (INLA) (Rue, Martino, and Chopin 2009) and Markov chain Monte Carlo (MCMC) methods (Gelman et al. 2013). INLA is a computational approach to perform approximate Bayesian inference in latent Gaussian models. This includes a wide range of models such as generalized linear mixed models and spatial and spatio-temporal models. INLA uses a combination of analytical approximations and numerical integration to obtain approximated posterior distributions of the parameters and is very fast compared to MCMC methods.

The **R-INLA** package (Rue, Lindgren, and Teixeira Krainski 2022) can be used to fit models using INLA.
The INLA website (http://www.r-inla.org) includes documentation, examples, and other resources about INLA and the **R-INLA** package, including books that provide an introduction to Bayesian data analysis using INLA as well as practical examples in a variety of settings (Wang, Ryan, and Faraway 2018; Krainski et al. 2019; Moraga 2019; Gómez-Rubio 2020).

To install **R-INLA**, we use the `install.packages()`

function specifying the **R-INLA** repository since the package is not on CRAN.

```
install.packages("INLA",
repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE)
library(INLA)
```

Then, to fit a model, we write the linear predictor as a formula object in R,
and call the `inla()`

function passing the formula, the family distribution, the data, and other options.
The object returned by `inla()`

contains the fitted model. This object can be inspected,
and the posterior distributions can be post-processed using a set of functions provided by **R-INLA**.
**R-INLA** also provides functionality to specify priors, as well as to obtain a number of criteria that allow us to assess and compare Bayesian models such as the deviance information criterion (DIC) (Spiegelhalter et al. 2002), the Watanabe-Akaike information criterion (WAIC) (Watanabe 2010), and the conditional predictive ordinate (CPO) (Held, Schrödle, and Rue 2010).

## 9.2 Spatial modeling of housing prices

Here, we provide an example on how to specify and fit a Bayesian hierarchical model to estimate housing prices in Boston, Massachusetts, USA, using the **R-INLA** package.

### 9.2.1 Housing prices in Boston, Massachusetts, USA

The Boston housing prices are in the **spData** package (Bivand, Nowosad, and Lovelace 2022), and can be obtained with the `st_read()`

function of the **sf** package (Pebesma 2022a) as follows.

```
library(sf)
library(spData)
map <- st_read(system.file("shapes/boston_tracts.shp",
package = "spData"), quiet = TRUE)
```

This dataset contains housing data of 506 Boston census tracts including median prices of owner-occupied housing in $1000 USD (`MEDV`

), per capita crime (`CRIM`

), and average number of rooms per dwelling (`RM`

).
We create the variable called `vble`

with the logarithm of the median prices, and map this variable using **mapview** (Figure 9.1). The map suggests that the housing prices are greater in the west, and prices are related to those in neighboring areas.

We will model the logarithm of the median prices using as covariates the per capita crime (`CRIM`

) and the average number of rooms per dwelling (`RM`

).
Figure 9.2 shows the relationships between pairs of variables visualized using the `ggpairs()`

function of the **GGally** package (Schloerke et al. 2021).
We observe a negative relationship between the logarithm of housing price and crime, and a
positive relationship between the logarithm of housing price and the average number of rooms.

### 9.2.2 Model

Let \(Y_i\) be the logarithm of housing price of area \(i\), \(i=1,\ldots,n\). We fit a BYM model that considers \(Y_i\) as the response variable, and crime and number of rooms as covariates:

\[Y_i \sim N(\mu_i, \sigma^2),\ i=1,\ldots,n,\]

\[\mu_i = \beta_0 + \beta_1 \times \mbox{crime}_i + \beta_2 \times \mbox{rooms}_i + u_i + v_i.\]

Here, \(\beta_0\) is the intercept, and \(\beta_1\) and \(\beta_2\) represent, respectively, the coefficients of the covariates crime and number of rooms. \(u_i\) is a spatially structured effect modeled with a CAR structure, \(u_i|\mathbf{u_{-i}} \sim N(\bar{u}_{\delta_i}, \frac{\sigma_u^2}{n_{\delta_i}})\). \(v_i\) is an unstructured effect modeled as \(v_i \sim N(0, \sigma_v^2)\).

### 9.2.3 Neighborhood matrix

In the model, the spatial random effect \(u_i\) needs to be specified using a neighborhood structure.
Here, we assume two areas are neighbors if they share a common boundary, and we create a neighborhood structure using functions of the **spdep** package (Bivand 2022).
First, we use the `poly2nb()`

function to create a neighbors list based on areas with contiguous boundaries.
Each element of the list `nb`

represents one area and contains the indices of its neighbors.
For example, `nb[[1]]`

contains the neighbors of area 1.

```
[[1]]
[1] 2 3 6 8 311 313 314 369
[[2]]
[1] 1 3 4 6
[[3]]
[1] 1 2 4 5 369 371 375 376
[[4]]
[1] 2 3 5 6
[[5]]
[1] 3 4 6 7 375 376 411 413 418
[[6]]
[1] 1 2 4 5 7 8
```

Then, we use the `nb2INLA()`

function to convert the `nb`

list into a file called `map.adj`

with the representation of the neighborhood matrix as required by **R-INLA**.
The `map.adj`

file is saved in the working directory which can be obtained with `getwd()`

.
Then, we read the `map.adj`

file using the `inla.read.graph()`

function of **R-INLA**, and store it in the object `g`

which we later use to specify the spatial model using **R-INLA**.

```
nb2INLA("map.adj", nb)
g <- inla.read.graph(filename = "map.adj")
```

###
9.2.4 Model formula and `inla()`

call

We specify the model formula by including the response variable, the `~`

symbol, and the fixed and random effects. By default, there is an intercept so we do not need to include it in the formula.
In the formula, random effects are specified with the `f()`

function. This function includes as first argument an index vector that specifies the element of the random effect that applies to each observation, and as second argument the model name.
For the spatial random effect \(u_i\), we use `model = "besag"`

with neighborhood matrix given by `g`

.
The option `scale.model = TRUE`

is used to make the precision parameter of models with different CAR priors comparable (Freni-Sterrantino, Ventrucci, and Rue 2018).
For the unstructured effect \(v_i\), we choose `model = "iid"`

.
The index vectors of the random effects are given by `re_u`

and `re_v`

created for \(u_i\) and \(v_i\), respectively. These vectors are equal to \(1,\ldots,n\), where \(n\) is the number of areas.

```
formula <- vble ~ CRIM + RM +
f(re_u, model = "besag", graph = g, scale.model = TRUE) +
f(re_v, model = "iid")
```

Note that in **R-INLA**, the BYM model can also be specified with `model = "bym"`

and this comprises both the spatial and unstructured components. Alternatively, we can use the BYM2 model (Simpson et al. 2017) which is a new parametrization of the BYM model that uses a scaled spatial component \(\boldsymbol{u_*}\) and an unstructured component \(\boldsymbol{v_*}\):

\[\boldsymbol{b} = \frac{1}{\sqrt{\tau_b}}(\sqrt{1-\phi}\boldsymbol{v_*} + \sqrt{\phi}\boldsymbol{u_*}).\]

In this model, the precision parameter \(\tau_b>0\) controls the marginal variance contribution of the weighted sum of \(\boldsymbol{u_*}\) and \(\boldsymbol{v_*}\). The mixing parameter \(0 \leq \phi \leq 1\) measures the proportion of the marginal variance explained by the spatial component \(\boldsymbol{u_*}\). Thus, the BYM2 model is equal to an only spatial model when \(\phi=1\), and an only unstructured spatial noise when \(\phi=0\) (Riebler et al. 2016). The formula of the model using the BYM2 component is specified as follows.

`formula <- vble ~ CRIM + RM + f(re_u, model = "bym2", graph = g)`

Then, we fit the model by calling the `inla()`

function specifying the formula, the family, the data, and using the default priors in **R-INLA**.
We also set `control.predictor = list(compute = TRUE)`

and `control.compute = list(return.marginals.predictor = TRUE)`

to compute and return the posterior means of the predictors.

### 9.2.5 Results

The resulting object `res`

contains the fit of the model. We can use `summary(res)`

to obtain a summary of the fitted model.
`res$summary.fixed`

contains a summary of the fixed effects.

`res$summary.fixed`

```
mean sd 0.025quant 0.5quant
(Intercept) 1.426719 0.088586 1.25287 1.426742
CRIM -0.007846 0.001293 -0.01038 -0.007846
RM 0.260341 0.014052 0.23279 0.260337
0.975quant mode kld
(Intercept) 1.60044 1.426742 2.146e-10
CRIM -0.00531 -0.007846 2.188e-10
RM 0.28792 0.260337 2.154e-10
```

We observe the intercept \(\hat \beta_0=\) 1.427 with a 95% credible interval equal to (1.253, 1.6). We observe the intercept \(\hat \beta_0=\) –0.008 with a 95% credible interval equal to (–0.01, –0.005). This indicates crime is significantly negatively related to housing price. The coefficient of rooms is \(\hat \beta_2=\) 0.26 with a 95% credible interval equal to (0.233, 0.288) indicating number of rooms is significantly positively related to housing price. Thus, the results suggest both crime and number of rooms are important in explaining the spatial pattern of housing prices.

We can type `res$summary.fitted.values`

to obtain a summary of the posterior distributions of the response \(\mu_i\) for each of the areas.
Column `mean`

indicates the posterior mean, and columns `0.025quant`

and `0.975quant`

are, respectively, the lower and upper limits of 95% credible intervals representing the uncertainty of the estimates obtained.

`summary(res$summary.fitted.values)`

```
mean sd 0.025quant
Min. :1.61 Min. :0.00994 Min. :1.59
1st Qu.:2.83 1st Qu.:0.00997 1st Qu.:2.81
Median :3.05 Median :0.00998 Median :3.03
Mean :3.04 Mean :0.00999 Mean :3.01
3rd Qu.:3.22 3rd Qu.:0.00999 3rd Qu.:3.20
Max. :3.91 Max. :0.01124 Max. :3.89
0.5quant 0.975quant mode
Min. :1.61 Min. :1.63 Min. :1.61
1st Qu.:2.83 1st Qu.:2.86 1st Qu.:2.83
Median :3.05 Median :3.08 Median :3.05
Mean :3.04 Mean :3.06 Mean :3.04
3rd Qu.:3.22 3rd Qu.:3.24 3rd Qu.:3.22
Max. :3.91 Max. :3.93 Max. :3.91
```

We can create variables with the posterior mean (`PM`

) and lower (`LL`

) and upper (`UL`

) limits of 95% credible intervals.

```
# Posterior mean and 95% CI
map$PM <- res$summary.fitted.values[, "mean"]
map$LL <- res$summary.fitted.values[, "0.025quant"]
map$UL <- res$summary.fitted.values[, "0.975quant"]
```

Then, we create maps of these variables with **mapview** specifying a common legend for the three maps, and using a popup table with the name, the logarithm of the housing prices, the covariates, and the posterior mean and 95% credible intervals.
We use the `sync()`

function of **leafsync** to plot synchronized maps.

```
# common legend
at <- seq(min(c(map$PM, map$LL, map$UL)),
max(c(map$PM, map$LL, map$UL)),
length.out = 8)
# popup table
popuptable <- leafpop::popupTable(dplyr::mutate_if(map,
is.numeric, round, digits = 2),
zcol = c("TOWN", "vble", "CRIM", "RM", "PM", "LL", "UL"),
row.numbers = FALSE, feature.id = FALSE)
m1 <- mapview(map, zcol = "PM", map.types = "CartoDB.Positron",
at = at, popup = popuptable)
m2 <- mapview(map, zcol = "LL", map.types = "CartoDB.Positron",
at = at, popup = popuptable)
m3 <- mapview(map, zcol = "UL", map.types = "CartoDB.Positron",
at = at, popup = popuptable)
```