Paula Moraga

<div class = "content">
<br>
<center>
<div style = 'margin-top: -80px; margin-bottom: -65px;'>
<p class="text-center" style = 'font-size: 52px; line-height:1.5; font-weight:bold'>Spatial Modeling and Visualization with R-INLA</p>
</div>
</center>

<br>

<div style = 'padding: 40px; padding-left: 40px; font-size: 32px; font-weight:bold; margin-bottom: -70px;'>
<br>
Paula Moraga, Ph.D.<br><br>
</div>

<div style = 'padding: 40px; padding-left: 40px; font-size: 28px; margin-bottom: -70px;'>
King Abdullah University of Science and Technology (KAUST)<br><br>
</div>

<div style = 'padding: 40px; padding-left: 40px; font-size: 26px; line-height:1.5;margin-bottom: 40px;'>
<a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i>&nbsp; @Paula_Moraga_</a><br>
<a href='https://Paula-Moraga.github.io/' target='_blank'> <i class='fa fa-globe fa-fw'></i>&nbsp; www.PaulaMoraga.com</a><br>
<a href='http://bit.ly/presgeos' target='_blank'><i class='fa fa-link fa-fw'></i>&nbsp; http://bit.ly/presgeos</a><br>
</div>

</td>
<td>

</td>
</tr>
</table>

</div>

---
<div style="margin-top:-20px"></div>

# Who am I?

<table style="margin:20px; border-top:0; border-bottom:0;">
<tr>
<td>
<p style="font-size:25px"><b>Paula Moraga, Ph.D.</b></p>
Assistant Professor of Statistics <br>for Public Health at KAUST<br><br>
PI Geospatial Statistics and Health Surveillance Research Group<br><br>
<img src="./figures/Statistics at KAUST_Logo for digital use_small.png" height = "120" alt = "a png">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="./figures/logogeohealth.png" height = "120" alt = "a png">
</td>
<td style="width:40%">
<img class = "circular--square" src="./figures/paula.png" width = "200" alt = "a png"><br><br>
<a href='http://twitter.com/Paula_Moraga_' target='_blank'><i class='fa fa-twitter fa-fw'></i>&nbsp; @Paula_Moraga_</a><br>
<a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i>&nbsp; www.PaulaMoraga.com</a><br>
</td>
</tr>
</table>

<br>

<i class='fa fa-map-marked-alt fa-fw'></i>&nbsp; Geospatial data analysis, statistical modeling<br>
<i class='fa fa-hospital fa-fw'></i>&nbsp; Spatial epidemiology, disease mapping, health surveillance<br>
<i class='fa fa-laptop fa-fw'></i>&nbsp; Development of R packages and interactive visualization applications<br>
<i class='fa fa-book fa-fw'></i>&nbsp; Author book Geospatial Health Data http://bit.ly/bookgeo<br>

---
background-image: url(./figures/overview.png)
background-size: contain

---

## Book

Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny (2019, Chapman & Hall/ CRC Press)    
http://www.paulamoraga.com/book-geospatial/

- Manipulate and transform point, areal and raster data,<br> and create maps with R

- Fit and interpret Bayesian spatial and spatio-temporal models with INLA and SPDE

- Model disease risk and quantify risk factors in different settings

- Interactive vizualizations, reproducible reports, dashboards, and Shiny apps

- Health examples but methods useful to analyze georeferenced data in other fields such as ecology or criminology

]

.pull-right[
<center>
<img src="./figures/bookcover.jpg"  style="margin-top:-5px; margin-left:-10px; width:85%;"/>
</center>
]

<!--
#---

# Outline

### Saturday

#### 9:00 - 10:15
<div style = 'margin-top: -20px;;'>

- Introduction to geospatial data and INLA

- Modeling areal data (lung cancer risk in Pennsylvania, USA)<br>
http://www.paulamoraga.com/tutorial-areal-data

#### 10:45 - 12:00
<div style = 'margin-top: -20px;;'>

- Dashboards to communicate results<br>
http://www.paulamoraga.com/tutorial-flexdashboard<br>

### Sunday

#### 9:00 - 10:15
<div style = 'margin-top: -20px;;'>

- Modeling geostatistical data (malaria prevalence in The Gambia)<br>
http://www.paulamoraga.com/tutorial-geostatistical-data

#### 10:45 - 12:00
<div style = 'margin-top: -20px;;'>

- Modeling point patterns (sloths occurrence in Costa Rica)<br>
http://www.paulamoraga.com/tutorial-point-patterns

-->

---

# Spatial Modeling and Visualization<br> with R-INLA

---

# Outline

We will learn how to develop and interpret spatial models with R-INLA (http://www.r-inla.org), and how to create maps and other visualizations that facilitate the communication with collaborators and policymakers.<br>
The course is hands-on with three parts:

- Introduction to geospatial data and R-INLA

- Modeling areal data (lung cancer risk in Pennsylvania, USA)<br>
http://www.paulamoraga.com/tutorial-areal-data

- Modeling geostatistical data (malaria prevalence in The Gambia)<br>
http://www.paulamoraga.com/tutorial-geostatistical-data

We will focus on health applications but the methods covered are applicable to many other fields that deal with georeferenced data such as ecology, environment or criminology

---

# R packages

```r
install.packages(c("sp", "spdep", "raster", "rgdal", "rgeos",
                   "ggplot2", "leaflet", "DT", "dplyr",
                   "SpatialEpi", "geoR"))

install.packages("INLA",
repos = "https://inla.r-inla-download.org/R/stable",
dep = TRUE)
```

---

# Geospatial data and methods

---
<img src="figures/snow-cholera-map.jpg" width="90%" style="display: block; margin: auto;" />

---
background-image: url(figures/snow-cholera-map-pump-top-margin.png)
background-size: contain

# John Snow's map of cholera, London, 1854

---
# Geospatial methods for disease surveillance

Geospatial methods use data on disease cases, individuals at risk, and risk factors
such as demographic and environmental factors to

- Understand geographic and temporal patterns
- Highlight areas of high risk and detect clusters
- Identify potential risk factors
- Measure inequalities
- Quantify the excess of disease risk close to a putative source
- Early detection of outbreaks

🗣 Maps and other visualizations enable to represent disease risk and risk factors, and interactive dashboards facilitate the communication of results

✍️ Results guide decision makers
to better allocate limited resources and
to design strategies for disease prevention and control

🌍 Many of these methods are also useful to analyze georeferenced data in others fields such as ecology, environment and criminology

---
background-image: url(figures/arealgeostatisticalpointpatterns.png)
background-size: contain
background-position: left 50%

# Types of spatial data

[Moraga and Lawson, Computational Statistics & Data Analysis, 2012](https://doi.org/10.1016/j.csda.2011.11.011)   
[Moraga et al., Parasites & Vectors, 2015](https://doi.org/10.1186/s13071-015-1166-x)   
[Moraga and Montes, Statistics in Medicine, 2011](https://doi.org/10.1002/sim.4160)

---
<div style="margin-top:-40px"></div>

# Areal data

**Standardized Mortality Ratio (SMR)** is often used to estimate disease risk

`$$SMR \mbox{ in area } i =
SMR_i = \frac{Y_i}{E_i} = \frac{\mbox{number observed cases in area } i}{\mbox{number expected cases in area } i}$$`

`$SMR_i = 1$` `$\rightarrow$` same number of cases observed as expected  
`$SMR_i > 1$` `$\rightarrow$` more number of cases observed than expected (high risk)
`$SMR_i < 1$` `$\rightarrow$` less number of cases cases observed than expected (low risk)

**Limitations**

SMRs may be misleading and unreliable in areas with small populations or rare diseases.
Models enable to incorporate covariates and borrow information from neighboring areas to obtain smoothed relative risks

---
<div style="margin-top:-40px"></div>

# Areal models

Model to estimate disease relative risk `$\theta_i$` in areas `$i=1,\ldots,n$`

`$$Y_i|\theta_i \sim Poisson(E_i \times \theta_i)$$`
`$$\log(\theta_i)  = \boldsymbol{z}_i \boldsymbol{\beta} + u_i + v_i$$`

- `$Y_i$`: number observed cases in area `$i$`
- `$E_i$`: number expected cases in area `$i$`
- `$\theta_i$`: relative risk in area `$i$`

Fixed effects quantify the effects of the covariates on the disease risk

- `$\boldsymbol{z}_i = (1, z_{i1}, \ldots, z_{ip})$` vector of the intercept and covariates   
`$\beta = (\beta_0, \beta_1, \ldots, \beta_p)'$` coefficient vector

Random effects represent residual variation which is not
explained by the available covariates

- `$u_i$`: structured spatial effect to account for the spatial
dependence between relative risks (areas that are close show
more similar risk than areas that are not close)
- `$v_i$`: unstructured spatial effect to account for independent noise

---
<div style="margin-top:-10px"></div>

# Geostatistical data

---
<div style="margin-top: -45px;">

# Geostatistical models

Models to predict prevalence at unsampled locations

`$$Y_i|P(\boldsymbol{x}_i)\sim \mbox{Binomial} (n_i, P(\boldsymbol{x}_i))$$`
`$$\mbox{logit}(P(\boldsymbol{x}_i))  = \boldsymbol{z}_i \boldsymbol{\beta} + S(\boldsymbol{x}_i) + u_i$$`

`$Y_i$` number people positive, `$n_i$` number people tested, `$P(x_i)$` prevalence at `$\boldsymbol{x}_i$`

Covariates based on characteristics known to affect disease transmission
(temperature, precipitation, vegetation, elevation, population density, etc.)

Random effects model residual variation not explained by covariates

---
<div style="margin-top:-40px"></div>

# Point patterns

Assume point pattern `$\{s_i: i=1, \ldots, n\}$` has been generated as a realization of a point process
`$Z = \{Z(s): s \in \mathbb{R}^2\}$`

Point processes are stochastic models that describe the locations of events of interest and possibly some additional information such as marks that inform about different types of events (e.g., cases and controls)

A point process model can be used to identify patterns in the distribution of the observed locations, estimate the intensity of events, and learn about the correlation between the locations and spatial covariates

---
<div style = 'margin-top: -30px'/>

# Coordinate Reference Systems (CRS)

1. **unprojected or geographic**: Latitude and Longitude for referencing location on the ellipsoid Earth  
(decimal degrees (DD) or degrees, minutes, and seconds (DMS))

2. **projected**: Easting and Northing for referencing location on 2-dimensional representation of Earth   
Common projection: [Universal Transverse Mercator (UTM)](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system)  
Location is given by the zone number (60 zones), hemisphere (north or south), and Easting and Northing coordinates in the zone in meters<br>

---

# Integrated nested Laplace approximation (INLA)

---
<div style = 'margin-top: -40px'/>

# INLA

Integrated nested Laplace approximation
(INLA) is a computational approach to perform approximate Bayesian inference in **latent Gaussian models** such as generalized linear mixed models and spatial and spatio-temporal models

INLA uses a combination of analytical approximations and numerical integration to obtain approximated posterior distributions of the parameters and is very fast compared to MCMC

The combination of INLA and the stochastic partial differential equation (SPDE) approaches permits to analyze point-level data

INLA and SPDE can be applied with the `R` package `R-INLA`

**Resources**

INLA website http://www.r-inla.org/

Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny.
Moraga, P. CRC Press, 2019<br>
https://www.paulamoraga.com/book-geospatial/

Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA.
Krainski, E., et al. CRC Press, 2019<br>
https://becarioprecario.bitbucket.io/spde-gitbook/

---
<div style = 'margin-top: -40px'>

# Latent Gaussian models

**Observations** belong to an exponential family with mean `$\mu_i=g^{-1}(\eta_i)$`<br>
(e.g. Gaussian, Poisson, Binomial, ...)
`$$y_i|\boldsymbol{x},\boldsymbol{\theta} \sim \pi(y_i|x_i,\boldsymbol{\theta})\ i=1, \ldots, n$$`

**Latent Gaussian field**
`$$\boldsymbol{x}|\boldsymbol{\theta}\sim N(\boldsymbol{\mu(\theta)},\boldsymbol{Q(\theta)}^{-1})$$`

**Hyperparameters** (not necessarily Gaussian)
`$$\boldsymbol{\theta} \sim \pi(\boldsymbol{\theta})$$`
**Linear predictor** accounts for effects of covariates in an additive way
`$$\eta_i=\alpha+\sum_{k=1}^{n_{\beta}}\beta_k z_{ki}+\sum_{j=1}^{n_f}f^{(j)}(u_{ji})$$`
`$\alpha$` is the intercept,
`$\{\beta_k\}$`'s quantify the linear effects of covariates `$\{z_{ki}\}$` on the response,
`$\{f^{(j)}(\cdot)\}$`'s are a set of random effects defined in terms of some covariates   `$\{u_{ji}\}$`
(e.g. iid, rw1, ar1, CAR, ...)

Latent Gaussian variables `$\boldsymbol{x}=(\alpha,\{\beta_k\},\{f^{(j)}\})|\boldsymbol{\theta}\sim N(\boldsymbol{\mu(\theta)},\boldsymbol{Q(\theta)}^{-1})$`

---
<div style = 'margin-top: -40px'/>

# INLA

Compute the posterior marginals for the latent Gaussian field
and the hyperparameters

`$$\pi(x_i|\boldsymbol{y})=\int \pi(x_i|\boldsymbol{\theta},\boldsymbol{y})\pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta},\
    \pi(\theta_j|\boldsymbol{y})=\int \pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta}_{-j}$$`
    
Use this form to construct **nested approximations**

`$$\tilde \pi(x_i|\boldsymbol{y})=\int \tilde \pi(x_i|\boldsymbol{\theta},\boldsymbol{y})\tilde \pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta},\
    \tilde \pi(\theta_j|\boldsymbol{y})=\int \tilde \pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta}_{-j}$$`
    
    
This approximation can be **integrated** numerically with respect to `$\boldsymbol{\theta}$`
`$$\tilde \pi(x_i|\boldsymbol{y})=\sum_k \tilde \pi(x_i|\boldsymbol{\theta_k},\boldsymbol{y})\tilde \pi(\boldsymbol{\theta_k}|\boldsymbol{y})\times \Delta_k$$`
`$$\tilde \pi(\theta_j|\boldsymbol{y})=\sum_l \tilde \pi(\boldsymbol{\theta_l^*}|\boldsymbol{y})\times \Delta_l^*$$`

`$\Delta_k$` area weight corresponding to `$\boldsymbol{\theta_k}$`<br>
`$\Delta_l^*$` area weight corresponding to `$\boldsymbol{\theta_l^*}$`

---
# INLA
  
The approximated posterior distributions  `$\tilde \pi (x_i|\boldsymbol{y})$` can be post-processed to compute quantities of interest like posterior expectations and quantiles

<table style = "border-top:0; border-bottom:0;">
<tr><td> Expectation </td><td> 95% C.I. </td></tr>
<tr>
<td><img src="./figures/inlapostprocessing1.png"/></td>
<td><img src="./figures/inlapostprocessing2.png"/></td>
</tr></table>

---
# R-INLA package

Install `INLA`

```r
install.packages("INLA",
repos = "https://inla.r-inla-download.org/R/stable",
dep = TRUE)

library(INLA)
```

---
<div style = 'margin-top: -40px'/>

# R-INLA package

Model
`$$Y_i | \eta_i, \sigma^2 \sim N(\eta_i, \sigma^2), i = 1, \ldots, n$$`
`$$\eta_i = \beta_0 + \beta_1 \times x_{1i} + \beta_2 \times x_{2i} + u_i,\ u_i \sim N(0, \sigma^2_u)$$`

1 Write the linear predictor as a formula object in R

```r
formula <- y ~ x1 + x2 + f(id, model = "iid")
```

<small>
Random effects specified with `f()`. First argument `id` is an index vector that specifies the element of the random effect that applies to each observation, second argument model name
</small>
 
 
2 Call the function `inla()`

```r
res <- inla(formula, family = "gaussian", data = d)
```
3 Inspect the `res` object wich contains the fitted model.<br>
Posteriors can be post-processed using a set of functions provided by R-INLA

```r
summary(res)
res$summary.fixed
```

<!--
DIAGNOSTICS
Cross-validation. Data for validation. Compare prediction with real data (correlation, MSE, coverage probabilities)

The package also provides estimates of different criteria to assess and compare Bayesian models. These include the model deviance information criterion (DIC) (Spiegelhalter et al. 2002), the Watanabe-Akaike information criterion (WAIC) (Watanabe 2010), the marginal likelihood, and the conditional predictive ordinates (CPO) (Held, Schrödle, and Rue 2010).

OTHER MODELS WITH INLA
Survival model (time to an event, time to death)
Zero-inflated models
Repeated measurements, random effects

DIFFERENCE WITH GAM

Easy to quantify uncertainty

GEOSTATISTICAL DATA

Triangulation guidelines. Results change with different triangulations

Units. Same data. Degrees or meters

Length triangels need to be smaller than the range (distance such that correlation between values is 0)

In the triangulation add extension to avoid boundary effects where the variance increases near the border. Sometimes we want to respect borders

BYM
scale.model = TRUE Make precision parameters of models with different priors comparable

neff parameters (independent parameters)
Equivalente replicates: #data/neff parameters

POINT PATTERN

Stationary. Invariant to shifts (s1,..., sn) Z(s1),... Z(sn)
Distribution Z(s1+h),... Z(sn+h) same in translation

inla.doc("rw2d")

Celcius degrees * 10

cellarea = 0.1 * 0.1 decimal degree = 0.01
-->

---

# Areal data

---
<div style="margin-top:-40px"></div>

# Areal data

**Standardized Mortality Ratio (SMR)** is often used to estimate disease risk

`$$SMR \mbox{ in area } i =
SMR_i = \frac{Y_i}{E_i} = \frac{\mbox{number observed cases in area } i}{\mbox{number expected cases in area } i}$$`

`$SMR_i = 1$` `$\rightarrow$` same number of cases observed as expected  
`$SMR_i > 1$` `$\rightarrow$` more number of cases observed than expected (high risk)    
`$SMR_i < 1$` `$\rightarrow$` less number of cases cases observed than expected (low risk)

**Example**

`$SMR_i = \frac{Y_i}{E_i} = \frac{200}{100} = 2 > 1$` `$\rightarrow$` area `$i$` high risk (observed `$>$` expected)
`$SMR_i = \frac{Y_i}{E_i} = \frac{100}{200} = 0.5 < 1$` `$\rightarrow$` area `$i$` low risk (observed `$<$` expected)

---
<div style="margin-top:-20px"></div>

### Indirect standardization to calculate expected cases

The expected number of cases in area `$i$` is the number of cases one would expect if the population in area `$i$` behaved the way the standard population behaves

Standard population is considered as the whole population (all areas)

Population is stratified by several factors (e.g., age and sex)

`$$E_i = \sum_{j=1}^m r_j^{(std)} n_j^{(i)}$$`

- `$r_j^{(std)}=\frac{\mbox{number cases}}{\mbox{population}}$` rate in stratum `$j$` in the standard population

- `$n_j^{(i)}$`: population in stratum `$j$` of area `$i$`

### Limitations

---
<div style="margin-top:-40px"></div>

# Areal models

Model to estimate disease relative risk `$\theta_i$` in areas `$i=1,\ldots,n$`

`$$Y_i|\theta_i \sim Poisson(E_i \times \theta_i)$$`
`$$\log(\theta_i)  = \boldsymbol{z}_i \boldsymbol{\beta} + u_i + v_i$$`

- `$Y_i$`: number observed cases in area `$i$`
- `$E_i$`: number expected cases in area `$i$`
- `$\theta_i$`: relative risk in area `$i$`

Fixed effects quantify the effects of the covariates on the disease risk

- `$\boldsymbol{z}_i = (1, z_{i1}, \ldots, z_{ip})$` vector of the intercept and covariates   
`$\beta = (\beta_0, \beta_1, \ldots, \beta_p)'$` coefficient vector

Random effects represent residual variation which is not
explained by the available covariates

---
<div style = 'margin-top: -40px; margin-bottom: -20px;'>

# Tutorial

### Modeling

Areal data (lung cancer risk in Pennsylvania, USA)

http://www.paulamoraga.com/tutorial-areal-data

---

# Geostatistical data

---
<div style="margin-top:-10px"></div>

# Geostatistical data

---
<div style="margin-top: -45px;">

# Geostatistical models

Models to predict prevalence at unsampled locations

`$$Y_i|P(\boldsymbol{x}_i)\sim \mbox{Binomial} (n_i, P(\boldsymbol{x}_i))$$`
`$$\mbox{logit}(P(\boldsymbol{x}_i))  = \boldsymbol{z}_i \boldsymbol{\beta} + S(\boldsymbol{x}_i) + u_i$$`

`$Y_i$` number people positive, `$n_i$` number people tested, `$P(x_i)$` prevalence at `$\boldsymbol{x}_i$`

Covariates based on characteristics known to affect disease transmission
(temperature, precipitation, vegetation, elevation, population density, etc.)

Random effects model residual variation not explained by covariates

---

# Stochastic partial differential equation approach (SPDE)

In the SPDE approach, the continuously indexed Gaussian field `$S$` is represented as a discretely indexed Gaussian Markov random field (GMRF) by means of a finite basis function defined on a triangulation of the region of study

.pull-left[
`$$S(\mathbf{x})=\sum_{g=1}^G \psi_g(\mathbf{x}) S_g,$$`
- `$\psi_g(\cdot)$` piecewise polynomial basis functions on each triangle
- `$\{S_g\}$` zero-mean Gaussian distributed
- `$G$` number of vertices in the triangulation
]

]

---
# Projection matrix

`$S(\mathbf{x})$` weighted average of the values of the GMRF in the vertices of
the triangle containing the location of the observation. <br>
Weights = barycentric coordinates

`$$S(\mathbf{x}) \approx \frac{A_{1}}{A}S_1 + \frac{A_{2}}{A}S_2 + \frac{A_{3}}{A}S_3$$`

---
# Projection matrix

`$$S(\mathbf{x}_i)  \approx \sum_{g=1}^G A_{ig}S_g$$`
where `$A$` is a projection matrix that maps the GMRF from the observations to the triangulation nodes

Row `$i$` in `$A$` corresponding to observation `$i$` has
possibly three non-zero values at columns that represent the vertices of the triangle containing the point (non-zero values are equal to the barycentric coordinates)

`$$A = \left(\begin{array}{ccccc} A_{11} & A_{12} & A_{13} & \dots  & A_{1G} \\ A_{21} & A_{22} & A_{23} & \dots  & A_{2G} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ A_{n1} & A_{n2} & A_{n3} & \dots  & A_{nG} \end{array}\right) = \left(\begin{array}{ccccc}  1 & 0 & 0 & \dots  & 0\\ 0 & 0 & 1 & \dots  & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ .2 & .2 & 0 & \dots  & .6 \\ \end{array}\right)$$`

---
<div style = 'margin-top: -40px; margin-bottom: -20px;'></div>

# Tutorial

### Modeling

Geostatistical data (malaria prevalence in The Gambia)

http://www.paulamoraga.com/tutorial-geostatistical-data

<!--

#---

# Point patterns

#---

# Point patterns

Assume point pattern `$\{s_i: i=1, \ldots, n\}$` has been generated as a realization of a point process
`$Z = \{Z(s): s \in \mathbb{R}^2\}$`

#---
# Point processes

Intensity of the process: mean number of events per unit area at location s
`$$\lambda(s) = lim_{|ds|\rightarrow 0} \frac{E[N(ds)]}{|ds|}$$`

#---

# Homogeneous Poisson process

The simplest point process model is the homogeneous Poisson process

1. the number of events in any region `$A$` follows a Poisson distribution with mean `$\lambda |A|$`, where `$\lambda$` is a constant value denoting the intensity and `$|A|$` is the area of `$A$`
2. given that there are `$n$` events inside `$A$`, the locations of these events are independent and uniformly distributed in `$A$`

Thus, if a point pattern arises as a realization of an homogeneous Poisson process, an event is equally likely to occur at any location within the study region, regardless of the locations of other events

#---
# Log-Gaussian Cox process

A log-Gaussian Cox process (LGCP) is a Poisson process with a varying intensity which is itself a stochastic process of the form
`$$\Lambda(s) = exp(\eta(s))$$`
where `$\eta = \{\eta(s): s \in \mathbb{R}^2\}$` is a Gaussian process

Conditional on `$\Lambda(s)$`, the point process is a Poisson process with intensity `$\Lambda(s)$`
1. the number of events in any region `$A$` is distributed as `$Poisson(\int_A \Lambda(s)ds)$`
2. the locations of events are an independent random sample with probability density proportional to `$\Lambda(s)$`

Log-Gaussian Cox processes are typically used to model phenomena that are environmentally driven

#---

# Fitting a log-Gaussian Cox process model

Assume point pattern `$\{s_i: i=1, \ldots, n\}$` has been generated as a realization of a LGCP with intensity `$\Lambda(s)= exp(\eta(s))$`

A LGCP can be fitted by discretizing the study region into a grid with `$n_1 \times n_2$` cells `$\{s_{ij}: i=1,\ldots,n_1, j=1,\ldots,n_2\}$`.
`$|s_{ij}|$` area of cell `$s_{ij}$`

Mean number of events in cell `$s_{ij}$`:
`$\Lambda_{ij}=\int_{s_{ij}} exp(\eta(s))ds \approx |s_{ij}| exp(\eta_{ij})$`

`$$y_{ij}|\eta_{ij} \sim Poisson(|s_{ij}| exp(\eta_{ij}))$$`
`$$\eta_{ij} = \beta_0 + \beta_1 \times cov(s_{ij}) + f_s(s_{ij}) + f_u(s_{ij})$$`
- `$y_{ij}$` observed number of locations in grid cell `$s_{ij}$`
- `$\beta_0$` intercept, `$cov(s_{ij})$` covariate at `$s_{ij}$`, `$\beta_1$` coefficient of `$cov(s_{ij})$`
- `$f_s()$` spatially structured random effect (CAR model on a regular lattice)
- `$f_u()$` unstructured random effect

#---
<div style = 'margin-top: -40px; margin-bottom: -20px;'>

# Tutorial

### Modeling

Point patterns (sloths occurrence in Costa Rica)

http://www.paulamoraga.com/tutorial-point-patterns

`spocc` package: https://docs.ropensci.org/spocc/

-->

---
class: middle, inverse

# Interactive visualizations and dashboards <br> to communicate results

<br>

## HTML widgets

## R Markdown

## flexdashboard

## Shiny

---
class: center, middle, inverse

# HTML widgets

---
# HTML widgets

HTML widgets are interactive web visualizations built with JavaScript

http://www.htmlwidgets.org/

http://www.htmlwidgets.org/showcase_leaflet.html

---
<div style = 'margin-top: -40px; margin-bottom: -40px;'>

# Leaflet

http://rstudio.github.io/leaflet/

Basemaps: http://leaflet-extras.github.io/leaflet-providers/preview/index.html

<!--
addTiles()
https://rstudio.github.io/leaflet/basemaps.html

Interactive map over time

https://github.com/socib/Leaflet.TimeDimension#examples-and-basic-usage

https://github.com/timelyportfolio/leaftime

https://kepler.gl/demo/earthquakes
https://kepler.gl/demo/map?mapUrl=https://dl.dropboxusercontent.com/s/cq7d5wmf7kfq3bl/keplergl_hu7426c.json

https://github.com/SymbolixAU/mapdeck/issues/221
https://symbolixau.github.io/mapdeck/articles/mapdeck.html

-->

---
# Dygraphs
http://rstudio.github.io/dygraphs/

---
# DataTable
http://rstudio.github.io/DT/

---
class: center, middle, inverse

# Reproducible documents with<br> R Markdown

---

# R Markdown

R Markdown can be used to turn our analysis into fully reproducible documents that can be shared with others. Output formats include HTML, PDF or Word. 
An R Markdown file is written with Markdown syntax with embedded R code, and can include narrative text, tables and visualizations

https://rmarkdown.rstudio.com/

http://www.paulamoraga.com/book-geospatial/sec-rmarkdown.html

---
# YAML header

```r
---
title: "An R Markdown document"
author: "Paula Moraga"
date: "1 July 2019"
output: pdf_document
---
```

---
<div style="margin-top:-25px"></div>

# Markdown syntax

`**bold text** *italic text*`

**bold text** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *italic text*

```rmarkdown
- unordered item
- unordered item
    1. first item
    2. second item
    3. third item
```

- unordered item
- unordered item
    1. first item
    2. second item
    3. third item

]

```rmarkdown
# First-level header
## Second-level header
### Third-level header
```

# First-level header
## Second-level header
### Third-level header

]

```rmarkdown
$$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$
```

`$$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$`

---
# R code chunks

````r
```{r, warning = FALSE}
# R code to be executed
```
````

Options:

- `echo=FALSE` code will not be shown in the document, but it will run and the output will be displayed in the document
- `eval=FALSE` code will not run, but it will be shown in the document
- `include=FALSE` code will run, but neither the code nor the output will be included in the document
- `results='hide'` output will not be shown, but the code will run and will be displayed in the document
- `cache=TRUE` code chunk is not executed if it has been executed before and nothing in the code chunk has changed since then
- `error=FALSE`, `warning=FALSE`, `message=FALSE` supress errors, warnings or messages

---
background-image: url(figures/rmarkdown.png)
background-size: contain
background-position: left 70%

# R Markdown

---
class: center, middle, inverse

# Interactive dashboards with<br> flexdashboard

---

# Interactive dashboards with flexdashboard

The R package **flexdashboard** uses R Markdown to publish a group of related data visualizations as a dashboard

https://rmarkdown.rstudio.com/flexdashboard/   
https://rmarkdown.rstudio.com/flexdashboard/examples.html
http://www.paulamoraga.com/book-geospatial/sec-flexdashboard.html
http://www.paulamoraga.com/book-geospatial/sec-dashboardswithshiny.html

---
# Layout

---
background-image: url(figures/pm3.gif)
background-size: contain
background-position: left 70%

# Interactive dashboards with flexdashboard

---
class: center, middle, inverse

# Shiny web applications

---

# Shiny web applications

**Shiny** is a web application framework for R that enables to build interactive web applications

https://shiny.rstudio.com/

http://www.paulamoraga.com/book-geospatial/sec-shiny.html

Examples

https://shiny.rstudio.com/gallery/single-file-shiny-app.html

https://shiny.rstudio.com/gallery/telephones-by-region.html

---
<div style="margin-top:-40px"></div>

# SpatialEpiApp

Shiny app for disease risk estimation, cluster detection, and interactive viz

- Risk estimates by fitting Bayesian models with [INLA](http://www.r-inla.org/)
- Detection of clusters by using the scan statistics in [SaTScan](https://www.satscan.org/)

http://www.paulamoraga.com/software/

```r
library(devtools)
install_github("Paula-Moraga/SpatialEpiApp")
library(SpatialEpiApp)
run_app()
```

---
<div style="margin-top:-40px"></div>

# Structure of a Shiny App

A Shiny app can be built by creating a directory (called, for example, `appdir`) that contains an R file (called, for example, `app.R`) with three components:

- `ui` user interface object which controls layout and appearance of the app
- `server()` function with instructions to build objects displayed in the `ui`
- call to `shinyApp()` that creates the Shiny app from the `ui`/`server` pair

```r
# define user interface object
ui <- fluidPage( )
# define server() function
server <- function(input, output){ }
# call to shinyApp() which returns the Shiny app
shinyApp(ui = ui, server = server)
```

Save `app.R` inside directory `appdir`. Launch the app:

```r
library(shiny)
runApp("appdir_path")
```

Directory `appdir` can also contain data or other R scripts needed by the app.
We can also write two separate files `ui.R` and `server.R`.
This permits an easier management of the code for large apps

---
# Inputs

---
# Outputs

Plots, tables, texts, images

---
<div style="margin-top:-40px"></div>

# Inputs, outputs and reactivity

Inputs: we can interact with the app by modifying their values   
Outputs: objects we want to show in the app

```r
ui <- fluidPage(
  *Input(inputId = myinput, label = mylabel, ...)
  *Output(outputId = myoutput, ...)
)
server <- function(input, output){
  output$myoutput <- render*({
    # code to build the output.
    # If it uses an input value (input$myinput),
    # the output will be rebuilt whenever
    # the input value changes
  })}
```

We can include inputs by writing an `*Input()` function in the `ui`<br>
(e.g., `textInput()`, `dateRangeInput()`, `fileInput()`)

Outputs can be created by writing an `*Output()` function in the `ui`, and the R code to build the output inside a `render*()` function in `server()`
(e.g., `textOutput()` and `renderText()`,
`tableOutput()` and `renderTable()`)

`fluidPage()` creates a display that automatically adjusts the app to the dimensions of the browser window

---
# Inputs, outputs and reactivity

https://shiny.rstudio.com/gallery/single-file-shiny-app.html

---
# Sharing a Shiny app

Option 1: Share R scripts with other users

- users need R

```r
library(shiny)
runApp("appdir_path")
```

Option 2: Host app as a web page at its own URL

- users do not need R
- app can be navigated through the internet with a web browser
- host apps on own servers or using one of the ways RStudio offers such as [shinyapps.io](http://www.shinyapps.io/) and [Shiny Server](https://www.rstudio.com/products/shiny/shiny-server/)

Example: https://paulamoraga.shinyapps.io/spatialepiapp/

---
<div style = 'margin-top: -40px; margin-bottom: -20px;'>

# Tutorial

### Dashboards to communicate results

Interactive dashboards with **flexdashboard**

http://www.paulamoraga.com/tutorial-flexdashboard

Example (lung cancer risk in Pennsylvania, USA)

http://www.paulamoraga.com/tutorial-flexdashboard-example

<!--
#---
class: inverse, middle, center
# Tutorials
#---
# R packages for tutorials

```r
install.packages(c("sp", "spdep", "raster", "rgdal",
                   "leaflet", "ggplot2", "geoR",
                   "dplyr", "SpatialEpi"))
install.packages("INLA",
repos = "https://inla.r-inla-download.org/R/stable",
dep = TRUE)
```
#---
<div style = 'margin-top: -20px; margin-bottom: -20px;'>
# Modeling
<div style = 'margin-top: -20px; margin-bottom: -20px;'>
Areal data (lung cancer risk in Pennsylvania, USA)
http://www.paulamoraga.com/tutorial-areal-data
Geostatistical data (malaria prevalence in The Gambia)
http://www.paulamoraga.com/tutorial-geostatistical-data
Point patterns (sloths occurrence in Costa Rica)
http://www.paulamoraga.com/tutorial-point-patterns
# Dashboards to communicate results
<div style = 'margin-top: -20px; margin-bottom: -20px;'>
Interactive dashboards with **flexdashboard**
http://www.paulamoraga.com/book-geospatial/sec-flexdashboard.html
http://www.paulamoraga.com/book-geospatial/sec-dashboardswithshiny.html
Example (lung cancer risk in Pennsylvania, USA)
http://www.paulamoraga.com/tutorial-flexdashboard-example
-->

</style>

---
background-image: url(./figures/Admissions-com-2020-COVID.png)
background-size: contain
background-position: top right

# Join my GeoHealth research group at KAUST!

I am looking for outstanding PhD students and Postdocs to join my group

👩‍💻 Potential research areas include the development of innovative statistical methods and computational tools for health and environmental applications

- Disease mapping, detection of clusters, early detection of outbreaks
- Integration of spatial and spatio-temporal data
- Development of R packages for data analysis and visualization

💪 Work closely with collaborators at KAUST and around the world

✈️ Generous travel funding to attend conferences and workshops

✨ Excellent research environment, free tuition, monthly living allowance, free housing, medical insurance, relocation support

KAUST http://kaust.edu.sa  &nbsp;&nbsp; Statistics Program http://stat.kaust.edu.sa   
Admissions http://admissions.kaust.edu.sa &nbsp;&nbsp; http://studyat.kaust.edu.sa

---

# References

<small>

Moraga, P. (2019). *Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny*. Chapman & Hall/CRC Press

Moraga, P., et al. (2019). epiflows: an R package for risk assessment of travel-related spread of disease. *F1000Research*, 7:1374

Moraga, P. (2018). Small Area Disease Risk Estimation and Visualization Using R. *The R Journal*, 10(1):495-506

Moraga, P. (2017). SpatialEpiApp: A Shiny Web Application for the analysis of Spatial and Spatio-Temporal Disease Data. *Spatial and Spatio-temporal Epidemiology*, 23:47-57

Moraga, P., et al. (2017). A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE. *Spatial Statistics*, 21:27-41

]

Moraga, P. and Kulldorff, M. (2016). Detection of spatial variations in temporal trends with a quadratic function. *Statistical Methods for Medical Research*, 25(4):1422-1437

Hagan, J. E., Moraga, P., et al. (2016). Spatio-temporal determinants of urban leptospirosis transmission: Four-year prospective cohort study of slum residents in Brazil. *PLOS Neglected Tropical Diseases*, 10(1): e0004275

Moraga, P., et al. (2015). Modelling the distribution and transmission intensity of lymphatic filariasis in sub-Saharan Africa prior to scaling up interventions: integrated use of geostatistical and mathematical modelling. *Parasites & Vectors*, 8:560

</small>

---

<div style = 'margin-top: 120px; margin-bottom: 40px;'>
<span style = 'font-size: 68px; line-height:1.5; font-weight:bold'>
Thanks!<br>
</span>
</div>

<span style = 'font-size: 38px; font-weight:bold'>
Paula Moraga<br>
</span>

<span style = 'font-size: 28px; line-height:1.5'>
<a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i>&nbsp; @Paula\_Moraga\_</a><br>
<a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i>&nbsp; www.PaulaMoraga.com</a><br>
</span>

<style>

.pull-left-50 {
  float: left;
  width: 50%;
}
.pull-right-50 {
  float: right;
  width: 50%;
}

.pull-left-60 {
  float: left;
  width: 60%;
}
.pull-right-40 {
  float: right;
  width: 40%;
}

.pull-right-40-padding {
  float: right;
  width: 38%;
  padding-left: 10px
}

</style>