Paula Moraga

<div class = "content">
<br>
<center>
<div style = 'margin-top: -80px; margin-bottom: -35px;'>
<p class="text-center" style = 'font-size: 52px; line-height:1.5; font-weight:bold'>Disease Risk Modeling and <br> Visualization using R</p>
</div>
</center>

<br>
<div style = 'padding: 40px; padding-left: 40px; font-size: 32px; font-weight:bold; margin-bottom: -70px;'>
Paula Moraga, Ph.D.
</div>

<div style = 'padding: 40px; padding-left: 40px; font-size: 26px; line-height:1.5; margin-bottom: 40px;'>
<a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i>&nbsp; @Paula_Moraga_</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i>&nbsp; www.PaulaMoraga.com</a>
<small>
<a href='http://www.paulamoraga.com/presentation-geospatial-dataviz' target='_blank'><i class='fa fa-link fa-fw'></i>&nbsp; www.paulamoraga.com/presentation-geospatial-dataviz</a>
</small>

</div>

---

## Geospatial modeling and visualization with R

- Methods to analyze geospatial health data that enable to quantify disease burden, understand geographic and temporal patterns, identify risk factors, and measure inequalities

- Maps and other visualizations that enable to represent disease risk and risk factors, and presentation options such as interactive dashboards
 to communicate results

- Examples focus on geospatial health data but the methods are also useful in others fields that use georeferenced data including epidemiology, ecology, demography and criminology

---

## References

Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny (2019, Chapman & Hall/ CRC Press)    
http://www.paulamoraga.com/book-geospatial/

---

---
background-image: url(figures/snow-cholera-map-pump-top-margin.png)
background-size: contain

# John Snow's map of cholera, London, 1854

---
# Geospatial methods for disease surveillance

Geospatial methods can be used to

- Understand geographic and temporal patterns
- Highlight areas of high risk and detect clusters
- Measure inequalities
- Identify potential risk factors
- Quantify the excess of disease risk close to a putative source
- Early detection of outbreaks

This information can guide decision makers and programme managers
to better allocate limited resources and
to design strategies for disease prevention and control

**Data**: Methods use information about disease cases, individuals at risk, and potential risk factors
such as demographic and environmental factors

**Disease models** describe the variability in disease risk as a function
of explanatory variables and random effects to account for unexplained variability.
Models allow straightforward extensions to
estimate covariate effects, and handle
spatio-temporal data and multiple diseases

---
background-image: url(figures/arealgeostatisticalpointpatterns.png)
background-size: contain
background-position: left 50%

# Types of spatial data

[Moraga and Lawson, Computational Statistics & Data Analysis, 2012](https://doi.org/10.1016/j.csda.2011.11.011)   
[Moraga et al., Parasites & Vectors, 2015](https://doi.org/10.1186/s13071-015-1166-x)   
[Moraga and Montes, Statistics in Medicine, 2011](https://doi.org/10.1002/sim.4160)

---

# Areal data

Disease risk is often estimated by the **Standardized Mortality Ratio (SMR)**
`$$SMR_i = \frac{Y_i}{E_i}$$`

- `$Y_i$`: number observed cases in area `$i$`
- `$E_i$`: number expected cases in area `$i$` (indirect standardization)

If `$SMR_i = 1$`, same number of cases observed as expected  
If `$SMR_i > 1$`, more number of cases observed than expected     
If `$SMR_i < 1$`, less number of cases cases observed than expected

---

# Expected cases

Expected cases are calculated using indirect standardization

Population is stratified by several factors (e.g., age and sex)

Standard population is considered as the whole population (all areas)

`$E_i$` is the expected number of cases in area `$i$` and represents the number of cases one would expect if the population in area `$i$` behaved the way the standard population behaves

`$$E_i = \sum_{j=1}^m r_j^{(std)} n_j^{(i)}$$`

- `$r_j^{(std)}$`: rate (number cases)/(population) in stratum `$j$` in the standard population

- `$n_j^{(i)}$`: population in stratum `$j$` of area `$i$`

---
<div style="margin-top:-30px"></div>

# Standardized Mortality Ratio (SMR)

`$SMR$` in area `$i$`

`$$SMR_i = \frac{Y_i}{E_i}$$`

`$Y_i$` number observed cases, `$E_i$` number expected cases in area `$i$`

**Example**

`$SMR_i = \frac{Y_i}{E_i} = \frac{100}{200} = 0.5 < 1$` `$\rightarrow$` area `$i$` low risk

`$SMR_i = \frac{Y_i}{E_i} = \frac{200}{100} = 2 > 1$` `$\rightarrow$` area `$i$` high risk

**Limitations**

SMRs may be misleading and unreliable in areas with small populations or rare diseases.
Models enable to incorporate covariates and borrow information from neighboring areas to obtain smoothed relative risks

---

# Areal models

Model to estimate disease relative risk `$\theta_i$` in areas `$i=1,\ldots,n$`

`$$Y_i|\theta_i \sim Poisson(E_i \times \theta_i)$$`
`$$\log(\theta_i)  = \boldsymbol{z}_i \boldsymbol{\beta} + u_i + v_i$$`

- `$Y_i$`: number observed cases in area `$i$`
- `$E_i$`: number expected cases in area `$i$`
- `$\theta_i$`: relative risk in area `$i$`

Fixed effects quantify the effects of the covariates on the disease risk

- `$\boldsymbol{z}_i = (1, z_{i1}, \ldots, z_{ip})$` vector of the intercept and covariates   
`$\beta = (\beta_0, \beta_1, \ldots, \beta_p)'$` coefficient vector

Random effects represent residual variation which is not
explained by the available covariates

- `$u_i$`: structured spatial effect to account for the spatial
dependence between relative risks (areas that are close show
more similar risk than areas that are not close)
- `$v_i$`: unstructured spatial effect to account for independent noise

---
<div style="margin-top:-10px"></div>

# Geostatistical data

---
<div style="margin-top: -45px;">

# Geostatistical models

Models to predict prevalence at unsampled locations

`$$Y_i|P(\boldsymbol{x}_i)\sim \mbox{Binomial} (n_i, P(\boldsymbol{x}_i))$$`
`$$\mbox{logit}(P(\boldsymbol{x}_i))  = \boldsymbol{z}_i \boldsymbol{\beta} + S(\boldsymbol{x}_i) + u_i$$`

`$Y_i$` number people positive, `$n_i$` number people tested, `$P(x_i)$` prevalence at `$\boldsymbol{x}_i$`

Covariates based on characteristics known to affect disease transmission
(temperature, precipitation, vegetation, elevation, population density, etc.)

Random effects model residual variation not explained by covariates

---
# Point patterns

Point processes are stochastic models that describe the locations of events of interest and possibly some additional information such as marks that inform about different types of events (e.g., cases and controls)

Assume point pattern `$\{s_i: i=1, \ldots, n\}$` has been generated as a realization of a point process.
A point process model can be used to identify patterns in the distribution of the observed locations, estimate the intensity of events, and learn about the correlation between the locations and spatial covariates

---
# Coordinate Reference Systems (CRS)

1. **unprojected or geographic**: Latitude/Longitude for referencing location on the ellipsoid Earth.
2. **projected**: Easting/Northing for referencing location on 2-dimensional representation of Earth.   
Common projection: [Universal Transverse Mercator (UTM)](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system).

---
class: center, middle, inverse

# Interactive visualizations and dashboards <br> to communicate results

### HTML widgets, R Markdown, flexdashboard, Shiny

---
# HTML widgets

HTML widgets are interactive web visualizations built with JavaScript

http://www.htmlwidgets.org/

http://www.htmlwidgets.org/showcase_leaflet.html

---
# Leaflet
http://rstudio.github.io/leaflet/

---
# Dygraphs
http://rstudio.github.io/dygraphs/

---
# DataTable
http://rstudio.github.io/DT/

---

# R Markdown

R Markdown can be used to turn our analysis into fully reproducible documents that can be shared with others. Output formats include HTML, PDF or Word. 
An R Markdown file is written with Markdown syntax with embedded R code, and can include narrative text, tables and visualizations

https://rmarkdown.rstudio.com/

http://www.paulamoraga.com/book-geospatial/sec-rmarkdown.html

---
# YAML header

```r
---
title: "An R Markdown document"
author: "Paula Moraga"
date: "1 July 2019"
output: pdf_document
---
```

---
<div style="margin-top:-25px"></div>

# Markdown syntax

.pull-left[

`**bold text** *italic text*`

**bold text** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *italic text*

```rmarkdown
- unordered item
- unordered item
    1. first item
    2. second item
    3. third item
```

- unordered item
- unordered item
    1. first item
    2. second item
    3. third item

]

.pull-right[

```rmarkdown
# First-level header
## Second-level header
### Third-level header
```

# First-level header
## Second-level header
### Third-level header

]

```rmarkdown
$$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$
```

`$$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$`

---
# R code chunks

````r
```{r, warning = FALSE}
# R code to be executed
```
````

Options:

- `echo=FALSE` code will not be shown in the document, but it will run and the output will be displayed in the document
- `eval=FALSE` code will not run, but it will be shown in the document
- `include=FALSE` code will run, but neither the code nor the output will be included in the document
- `results='hide'` output will not be shown, but the code will run and will be displayed in the document
- `cache=TRUE` code chunk is not executed if it has been executed before and nothing in the code chunk has changed since then
- `error=FALSE`, `warning=FALSE`, `message=FALSE` supress errors, warnings or messages

---
background-image: url(figures/rmarkdown.png)
background-size: contain
background-position: left 70%

# R Markdown

---

# Interactive dashboards with flexdashboard

The R package **flexdashboard** uses R Markdown to publish a group of related data visualizations as a dashboard

https://rmarkdown.rstudio.com/flexdashboard/   
https://rmarkdown.rstudio.com/flexdashboard/examples.html

http://www.paulamoraga.com/book-geospatial/sec-dashboardswithshiny.html

---
# Layout

---
background-image: url(figures/pm3.gif)
background-size: contain
background-position: left 70%

# Interactive dashboards with flexdashboard

---

# Shiny web applications

**Shiny** is a web application framework for R that enables to build interactive web applications

https://shiny.rstudio.com/

http://www.paulamoraga.com/book-geospatial/sec-shiny.html

Examples

https://shiny.rstudio.com/gallery/single-file-shiny-app.html

https://shiny.rstudio.com/gallery/telephones-by-region.html

---
<div style="margin-top:-40px"></div>

# SpatialEpiApp

Shiny app for interactive viz, disease risk estimation and cluster detection

- Risk estimates by fitting Bayesian models with [INLA](http://www.r-inla.org/)
- Detection of clusters by using the scan statistics in [SaTScan](https://www.satscan.org/)

http://www.paulamoraga.com/software/

```r
library(devtools)
install_github("Paula-Moraga/SpatialEpiApp")
library(SpatialEpiApp)
run_app()
```

---
# Structure of a Shiny App

A Shiny app is a directory that contains an R file called `app.R`

`app.R` has three components:

- `ui` user interface object which controls the layout and appearance of the app
- `server()` function with the instructions to build the objects displayed in the `ui`
- call to `shinyApp()` that creates the Shiny app from the `ui`/`server` pair

---
# Content app.R

```r
# define user interface object
ui <- fluidPage( )

# define server() function
server <- function(input, output){ }

# call to shinyApp() which returns the Shiny app
shinyApp(ui = ui, server = server)
```

Save `app.R` inside the `appdir` directory. Launch the app:

```r
library(shiny)
runApp("appdir_path")
```

---
# Inputs

---
# Outputs

Plots, tables, texts, images

---
# Inputs, outputs and reactivity

Inputs: we can interact with the app by modifying their values   
Outputs: objects we want to show in the app

```r
ui <- fluidPage(
  *Input(inputId = myinput, label = mylabel, ...)
  *Output(outputId = myoutput, ...)
)

server <- function(input, output){
  output$myoutput <- render*({
    # code to build the output.
    # If it uses an input value (input$myinput),
    # the output will be rebuilt whenever
    # the input value changes
  })}
```

---
# Inputs, outputs and reactivity

https://shiny.rstudio.com/gallery/single-file-shiny-app.html

---

class: inverse, middle, center

# Tutorials

---
# R packages for tutorials

```r
install.packages(c("sp", "spdep", "raster", "rgdal",
                   "leaflet", "ggplot2", "geoR",
                   "dplyr", "SpatialEpi"))

install.packages("INLA",
repos = "https://inla.r-inla-download.org/R/stable",
dep = TRUE)
```

---
# Tutorials

## Introduction

http://www.paulamoraga.com/presentation-geospatial-dataviz

## Modeling

Areal data (lung cancer risk in Pennsylvania, USA)

http://www.paulamoraga.com/tutorial-areal-data

Geostatistical data (malaria in The Gambia)

http://www.paulamoraga.com/tutorial-geostatistical-data

---
# Tutorials

## Creation of dashboards to communicate results

R Markdown

http://www.paulamoraga.com/book-geospatial/sec-rmarkdown.html

Interactive dashboards with **flexdashboard**

http://www.paulamoraga.com/book-geospatial/sec-flexdashboard.html

Example (lung cancer risk in Pennsylvania, USA)

http://www.paulamoraga.com/tutorial-flexdashboard-example

---

# Join my Geospatial Health Surveillance research group at KAUST!

I am hiring excellent Postdocs, PhD and Master's students to join my group

Development of innovative statistical methods and computational tools for disease surveillance that alert public health officials when elevated disease levels are anticipated and provide insights about disease drivers

Collaboration with colleagues in Europe, America and Asia on projects to prevent and control diseases such as COVID-19, dengue and malaria

Excellent research environment, free tuition, monthly living allowance, housing, medical and dental coverage, relocation support

KAUST http://kaust.edu.sa  &nbsp;&nbsp; Statistics Program http://stat.kaust.edu.sa   
Admissions http://admissions.kaust.edu.sa &nbsp;&nbsp; http://studyat.kaust.edu.sa

---

class: inverse

<div style = 'margin-top: 120px; margin-bottom: 40px;'>
<span style = 'font-size: 68px; line-height:1.5; font-weight:bold'>
Thanks!<br>
</span>
</div>

<span style = 'font-size: 38px; font-weight:bold'>
Paula Moraga<br>
</span>

<span style = 'font-size: 28px; line-height:1.5'>
<a href='mailto:pm865@bath.ac.uk' target='_blank'><i class='fa fa-envelope-square fa-fw'></i>&nbsp; pm865@bath.ac.uk</a><br>
<a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i>&nbsp; @Paula\_Moraga\_</a><br>
<a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i>&nbsp; www.PaulaMoraga.com</a><br>
</span>