<div class = "content"> <br> <center> <div style = 'margin-top: -80px; margin-bottom: -65px;'> <p class="text-center" style = 'font-size: 52px; line-height:1.5; font-weight:bold'>Spatial Modeling and Visualization with R-INLA</p> </div> </center> <br> <table style="margin:0px; margin-left:-10px; border-top:0; border-bottom:0;"> <tr> <td style="width: 440px;"> <div style = 'padding: 40px; padding-left: 40px; font-size: 32px; font-weight:bold; margin-bottom: -70px;'> <br> Paula Moraga, Ph.D.<br><br> </div> <div style = 'padding: 40px; padding-left: 40px; font-size: 28px; margin-bottom: -70px;'> King Abdullah University of Science and Technology (KAUST)<br><br> </div> <div style = 'padding: 40px; padding-left: 40px; font-size: 26px; line-height:1.5;margin-bottom: 40px;'> <a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i> @Paula_Moraga_</a><br> <a href='https://Paula-Moraga.github.io/' target='_blank'> <i class='fa fa-globe fa-fw'></i> www.PaulaMoraga.com</a><br> <a href='http://bit.ly/presgeos' target='_blank'><i class='fa fa-link fa-fw'></i> http://bit.ly/presgeos</a><br> </div> </td> <td> <center> <img src="./figures/logogeohealth.png" height = "200" alt = "a png"><br><br> <img src="./figures/Statistics at KAUST_Logo for digital use_small.png" height = "160" alt = "a png"> </center> </td> </tr> </table> </div> --- <div style="margin-top:-20px"></div> # Who am I? <div style="margin-top:-20px"></div> <style type="text/css"> .circular--square { border-radius: 50%; } </style> <table style="margin:20px; border-top:0; border-bottom:0;"> <tr> <td> <p style="font-size:25px"><b>Paula Moraga, Ph.D.</b></p> Assistant Professor of Statistics <br>for Public Health at KAUST<br><br> PI Geospatial Statistics and Health Surveillance Research Group<br><br> <img src="./figures/Statistics at KAUST_Logo for digital use_small.png" height = "120" alt = "a png"> <img src="./figures/logogeohealth.png" height = "120" alt = "a png"> </td> <td style="width:40%"> <img class = "circular--square" src="./figures/paula.png" width = "200" alt = "a png"><br><br> <a href='http://twitter.com/Paula_Moraga_' target='_blank'><i class='fa fa-twitter fa-fw'></i> @Paula_Moraga_</a><br> <a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i> www.PaulaMoraga.com</a><br> </td> </tr> </table> <br> <i class='fa fa-map-marked-alt fa-fw'></i> Geospatial data analysis, statistical modeling<br> <i class='fa fa-hospital fa-fw'></i> Spatial epidemiology, disease mapping, health surveillance<br> <i class='fa fa-laptop fa-fw'></i> Development of R packages and interactive visualization applications<br> <i class='fa fa-book fa-fw'></i> Author book Geospatial Health Data http://bit.ly/bookgeo<br> --- background-image: url(./figures/overview.png) background-size: contain --- <div style="margin-top:-20px"></div> ## Book <div style="margin-top:-10px"></div> Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny (2019, Chapman & Hall/ CRC Press) http://www.paulamoraga.com/book-geospatial/ .pull-left[ <div style="margin-top:-20px"></div> - Manipulate and transform point, areal and raster data,<br> and create maps with R - Fit and interpret Bayesian spatial and spatio-temporal models with INLA and SPDE - Model disease risk and quantify risk factors in different settings - Interactive vizualizations, reproducible reports, dashboards, and Shiny apps - Health examples but methods useful to analyze georeferenced data in other fields such as ecology or criminology ] .pull-right[ <center> <img src="./figures/bookcover.jpg" style="margin-top:-5px; margin-left:-10px; width:85%;"/> </center> ] <!-- #--- <div style = 'margin-top: -40px;'> # Outline ### Saturday <div style = 'margin-top: -20px;;'> #### 9:00 - 10:15 <div style = 'margin-top: -20px;;'> - Introduction to geospatial data and INLA - Modeling areal data (lung cancer risk in Pennsylvania, USA)<br> http://www.paulamoraga.com/tutorial-areal-data #### 10:45 - 12:00 <div style = 'margin-top: -20px;;'> - Dashboards to communicate results<br> http://www.paulamoraga.com/tutorial-flexdashboard<br> ### Sunday <div style = 'margin-top: -20px;;'> #### 9:00 - 10:15 <div style = 'margin-top: -20px;;'> - Modeling geostatistical data (malaria prevalence in The Gambia)<br> http://www.paulamoraga.com/tutorial-geostatistical-data #### 10:45 - 12:00 <div style = 'margin-top: -20px;;'> - Modeling point patterns (sloths occurrence in Costa Rica)<br> http://www.paulamoraga.com/tutorial-point-patterns --> --- class: inverse, center, middle # Spatial Modeling and Visualization<br> with R-INLA --- # Outline We will learn how to develop and interpret spatial models with R-INLA (http://www.r-inla.org), and how to create maps and other visualizations that facilitate the communication with collaborators and policymakers.<br> The course is hands-on with three parts: - Introduction to geospatial data and R-INLA - Modeling areal data (lung cancer risk in Pennsylvania, USA)<br> http://www.paulamoraga.com/tutorial-areal-data - Modeling geostatistical data (malaria prevalence in The Gambia)<br> http://www.paulamoraga.com/tutorial-geostatistical-data We will focus on health applications but the methods covered are applicable to many other fields that deal with georeferenced data such as ecology, environment or criminology --- # R packages ```r install.packages(c("sp", "spdep", "raster", "rgdal", "rgeos", "ggplot2", "leaflet", "DT", "dplyr", "SpatialEpi", "geoR")) install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE) ``` --- class: inverse, center, middle # Geospatial data and methods <!-- for<br> disease surveillance --> --- <img src="figures/snow-cholera-map.jpg" width="90%" style="display: block; margin: auto;" /> --- background-image: url(figures/snow-cholera-map-pump-top-margin.png) background-size: contain <div style="margin-top:-20px"></div> # John Snow's map of cholera, London, 1854 <!-- #--- ## Geospatial modeling and visualization with R - Methods to analyze geospatial health data that enable to quantify disease burden, understand geographic and temporal patterns, identify risk factors, and measure inequalities <img src="./figures/typesspatialdata.png" style="width:100%;"/> - Maps and other visualizations that enable to represent disease risk and risk factors, and presentation options such as interactive dashboards to communicate results - Examples focus on geospatial health data but the methods are also useful in others fields that use georeferenced data including epidemiology, ecology, demography and criminology --> --- # Geospatial methods for disease surveillance Geospatial methods use data on disease cases, individuals at risk, and risk factors such as demographic and environmental factors to - Understand geographic and temporal patterns - Highlight areas of high risk and detect clusters - Identify potential risk factors - Measure inequalities - Quantify the excess of disease risk close to a putative source - Early detection of outbreaks 🗣 Maps and other visualizations enable to represent disease risk and risk factors, and interactive dashboards facilitate the communication of results ✍️ Results guide decision makers to better allocate limited resources and to design strategies for disease prevention and control 🌍 Many of these methods are also useful to analyze georeferenced data in others fields such as ecology, environment and criminology <!-- **Data**: Methods use information about disease cases, individuals at risk, and potential risk factors such as demographic and environmental factors **Disease models** describe the variability in disease risk as a function of explanatory variables and random effects to account for unexplained variability. Models allow straightforward extensions to estimate covariate effects, and handle spatio-temporal data and multiple diseases --> --- background-image: url(figures/arealgeostatisticalpointpatterns.png) background-size: contain background-position: left 50% <div style="margin-bottom:-20px"></div> # Types of spatial data <div style="margin-bottom:500px"></div> [Moraga and Lawson, Computational Statistics & Data Analysis, 2012](https://doi.org/10.1016/j.csda.2011.11.011) [Moraga et al., Parasites & Vectors, 2015](https://doi.org/10.1186/s13071-015-1166-x) [Moraga and Montes, Statistics in Medicine, 2011](https://doi.org/10.1002/sim.4160) --- <div style="margin-top:-40px"></div> # Areal data <div style="margin-top:-40px"></div> <img src="./figures/georgiaLBW.png" width="40%" style="display: block; margin: auto;" /> <div style="margin-top:-10px"></div> **Standardized Mortality Ratio (SMR)** is often used to estimate disease risk <div style="margin-top:-10px"></div> `$$SMR \mbox{ in area } i = SMR_i = \frac{Y_i}{E_i} = \frac{\mbox{number observed cases in area } i}{\mbox{number expected cases in area } i}$$` `\(SMR_i = 1\)` `\(\rightarrow\)` same number of cases observed as expected `\(SMR_i > 1\)` `\(\rightarrow\)` more number of cases observed than expected (high risk) `\(SMR_i < 1\)` `\(\rightarrow\)` less number of cases cases observed than expected (low risk) **Limitations** <div style="margin-top:-10px"></div> SMRs may be misleading and unreliable in areas with small populations or rare diseases. Models enable to incorporate covariates and borrow information from neighboring areas to obtain smoothed relative risks --- <div style="margin-top:-40px"></div> # Areal models Model to estimate disease relative risk `\(\theta_i\)` in areas `\(i=1,\ldots,n\)` `$$Y_i|\theta_i \sim Poisson(E_i \times \theta_i)$$` `$$\log(\theta_i) = \boldsymbol{z}_i \boldsymbol{\beta} + u_i + v_i$$` - `\(Y_i\)`: number observed cases in area `\(i\)` - `\(E_i\)`: number expected cases in area `\(i\)` - `\(\theta_i\)`: relative risk in area `\(i\)` Fixed effects quantify the effects of the covariates on the disease risk - `\(\boldsymbol{z}_i = (1, z_{i1}, \ldots, z_{ip})\)` vector of the intercept and covariates `\(\beta = (\beta_0, \beta_1, \ldots, \beta_p)'\)` coefficient vector Random effects represent residual variation which is not explained by the available covariates - `\(u_i\)`: structured spatial effect to account for the spatial dependence between relative risks (areas that are close show more similar risk than areas that are not close) - `\(v_i\)`: unstructured spatial effect to account for independent noise --- <div style="margin-top:-10px"></div> # Geostatistical data <center> <div style="margin-left:-70px; margin-right:-20px; width:120%;"> <img src="./figures/mbg.png" style="width:100%;"/> </div> </center> --- <div style="margin-top: -45px;"> # Geostatistical models <div style="margin-top: -20px;"> Models to predict prevalence at unsampled locations `$$Y_i|P(\boldsymbol{x}_i)\sim \mbox{Binomial} (n_i, P(\boldsymbol{x}_i))$$` `$$\mbox{logit}(P(\boldsymbol{x}_i)) = \boldsymbol{z}_i \boldsymbol{\beta} + S(\boldsymbol{x}_i) + u_i$$` `\(Y_i\)` number people positive, `\(n_i\)` number people tested, `\(P(x_i)\)` prevalence at `\(\boldsymbol{x}_i\)` <div style="margin-top: -10px;"> Covariates based on characteristics known to affect disease transmission (temperature, precipitation, vegetation, elevation, population density, etc.) <div style="margin-top: -10px;"> <!-- (temperature, precipitation, vegetation, elevation, distance to water bodies, urbanization, land cover, population density, etc) Random effects (Gaussian process with Matern covariate funcion, Independent risk) --> Random effects model residual variation not explained by covariates <center> <img src="./figures/covgrf.png" style="width:80%;"/> </center> --- <div style="margin-top:-40px"></div> # Point patterns <div style="margin-top:-100px"></div> <center> <div style="margin-left:70px; width:40%;"> <img src="./figures/pointdataKidney.png"/> </div> </center> <div style="margin-top:-20px"></div> Assume point pattern `\(\{s_i: i=1, \ldots, n\}\)` has been generated as a realization of a point process `\(Z = \{Z(s): s \in \mathbb{R}^2\}\)` Point processes are stochastic models that describe the locations of events of interest and possibly some additional information such as marks that inform about different types of events (e.g., cases and controls) A point process model can be used to identify patterns in the distribution of the observed locations, estimate the intensity of events, and learn about the correlation between the locations and spatial covariates --- <div style = 'margin-top: -30px'/> # Coordinate Reference Systems (CRS) 1. **unprojected or geographic**: Latitude and Longitude for referencing location on the ellipsoid Earth (decimal degrees (DD) or degrees, minutes, and seconds (DMS)) 2. **projected**: Easting and Northing for referencing location on 2-dimensional representation of Earth Common projection: [Universal Transverse Mercator (UTM)](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system) Location is given by the zone number (60 zones), hemisphere (north or south), and Easting and Northing coordinates in the zone in meters<br> <img src="./figures/geographic.png" width="34%" /> <img src="./figures/Utm-zones.jpg" width="64%" /> --- class: inverse, center, middle # Integrated nested Laplace approximation (INLA) --- <div style = 'margin-top: -40px'/> # INLA <div style = 'margin-top: -20px'/> Integrated nested Laplace approximation (INLA) is a computational approach to perform approximate Bayesian inference in **latent Gaussian models** such as generalized linear mixed models and spatial and spatio-temporal models INLA uses a combination of analytical approximations and numerical integration to obtain approximated posterior distributions of the parameters and is very fast compared to MCMC <!-- - The main advantage of INLA is the huge improvement of speed compared to MCMC alternatives * Laplace approximations to obtain posterior marginals * Numerical algorithms for sparse matrices --> The combination of INLA and the stochastic partial differential equation (SPDE) approaches permits to analyze point-level data INLA and SPDE can be applied with the `R` package `R-INLA` **Resources** INLA website http://www.r-inla.org/ Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Moraga, P. CRC Press, 2019<br> https://www.paulamoraga.com/book-geospatial/ Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Krainski, E., et al. CRC Press, 2019<br> https://becarioprecario.bitbucket.io/spde-gitbook/ --- <div style = 'margin-top: -40px'> # Latent Gaussian models <div style = 'margin-top: -20px'> **Observations** belong to an exponential family with mean `\(\mu_i=g^{-1}(\eta_i)\)`<br> (e.g. Gaussian, Poisson, Binomial, ...) `$$y_i|\boldsymbol{x},\boldsymbol{\theta} \sim \pi(y_i|x_i,\boldsymbol{\theta})\ i=1, \ldots, n$$` **Latent Gaussian field** `$$\boldsymbol{x}|\boldsymbol{\theta}\sim N(\boldsymbol{\mu(\theta)},\boldsymbol{Q(\theta)}^{-1})$$` **Hyperparameters** (not necessarily Gaussian) `$$\boldsymbol{\theta} \sim \pi(\boldsymbol{\theta})$$` **Linear predictor** accounts for effects of covariates in an additive way `$$\eta_i=\alpha+\sum_{k=1}^{n_{\beta}}\beta_k z_{ki}+\sum_{j=1}^{n_f}f^{(j)}(u_{ji})$$` `\(\alpha\)` is the intercept, `\(\{\beta_k\}\)`'s quantify the linear effects of covariates `\(\{z_{ki}\}\)` on the response, `\(\{f^{(j)}(\cdot)\}\)`'s are a set of random effects defined in terms of some covariates `\(\{u_{ji}\}\)` (e.g. iid, rw1, ar1, CAR, ...) Latent Gaussian variables `\(\boldsymbol{x}=(\alpha,\{\beta_k\},\{f^{(j)}\})|\boldsymbol{\theta}\sim N(\boldsymbol{\mu(\theta)},\boldsymbol{Q(\theta)}^{-1})\)` <!-- #--- # INLA Posterior `$$\pi(\mathbf{x},\boldsymbol{\theta}|\mathbf{y})\propto \pi(\boldsymbol{\theta}) \pi(\mathbf{x}|\boldsymbol{\theta}) \prod \pi(y_i|x_i,\boldsymbol{\theta})$$` `\(\mathbf{x}\)` denote the vector of all latent Gaussian variables `\(\boldsymbol{\theta}\)` is the vector of hyperparameters which are not necessarily Gaussian Compute the posterior marginals for the latent field `$$\pi(x_i|\mathbf{y}),\ i=1,\ldots,n$$` Compute the posterior marginals for the hyperparameters `$$\pi(\theta_j|\mathbf{y}),\ j=1,\ldots,dim(\boldsymbol{\theta})$$` --> --- <div style = 'margin-top: -40px'/> # INLA Compute the posterior marginals for the latent Gaussian field and the hyperparameters `$$\pi(x_i|\boldsymbol{y})=\int \pi(x_i|\boldsymbol{\theta},\boldsymbol{y})\pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta},\ \pi(\theta_j|\boldsymbol{y})=\int \pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta}_{-j}$$` Use this form to construct **nested approximations** `$$\tilde \pi(x_i|\boldsymbol{y})=\int \tilde \pi(x_i|\boldsymbol{\theta},\boldsymbol{y})\tilde \pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta},\ \tilde \pi(\theta_j|\boldsymbol{y})=\int \tilde \pi(\boldsymbol{\theta}|\boldsymbol{y})d\boldsymbol{\theta}_{-j}$$` This approximation can be **integrated** numerically with respect to `\(\boldsymbol{\theta}\)` `$$\tilde \pi(x_i|\boldsymbol{y})=\sum_k \tilde \pi(x_i|\boldsymbol{\theta_k},\boldsymbol{y})\tilde \pi(\boldsymbol{\theta_k}|\boldsymbol{y})\times \Delta_k$$` `$$\tilde \pi(\theta_j|\boldsymbol{y})=\sum_l \tilde \pi(\boldsymbol{\theta_l^*}|\boldsymbol{y})\times \Delta_l^*$$` `\(\Delta_k\)` area weight corresponding to `\(\boldsymbol{\theta_k}\)`<br> `\(\Delta_l^*\)` area weight corresponding to `\(\boldsymbol{\theta_l^*}\)` --- # INLA The approximated posterior distributions `\(\tilde \pi (x_i|\boldsymbol{y})\)` can be post-processed to compute quantities of interest like posterior expectations and quantiles <table style = "border-top:0; border-bottom:0;"> <tr><td> Expectation </td><td> 95% C.I. </td></tr> <tr> <td><img src="./figures/inlapostprocessing1.png"/></td> <td><img src="./figures/inlapostprocessing2.png"/></td> </tr></table> --- # R-INLA package Install `INLA` ```r install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE) library(INLA) ``` --- <div style = 'margin-top: -40px'/> # R-INLA package Model `$$Y_i | \eta_i, \sigma^2 \sim N(\eta_i, \sigma^2), i = 1, \ldots, n$$` `$$\eta_i = \beta_0 + \beta_1 \times x_{1i} + \beta_2 \times x_{2i} + u_i,\ u_i \sim N(0, \sigma^2_u)$$` 1 Write the linear predictor as a formula object in R ```r formula <- y ~ x1 + x2 + f(id, model = "iid") ``` <small> Random effects specified with `f()`. First argument `id` is an index vector that specifies the element of the random effect that applies to each observation, second argument model name </small> 2 Call the function `inla()` ```r res <- inla(formula, family = "gaussian", data = d) ``` 3 Inspect the `res` object wich contains the fitted model.<br> Posteriors can be post-processed using a set of functions provided by R-INLA ```r summary(res) res$summary.fixed ``` <!-- DIAGNOSTICS Cross-validation. Data for validation. Compare prediction with real data (correlation, MSE, coverage probabilities) The package also provides estimates of different criteria to assess and compare Bayesian models. These include the model deviance information criterion (DIC) (Spiegelhalter et al. 2002), the Watanabe-Akaike information criterion (WAIC) (Watanabe 2010), the marginal likelihood, and the conditional predictive ordinates (CPO) (Held, Schrödle, and Rue 2010). OTHER MODELS WITH INLA Survival model (time to an event, time to death) Zero-inflated models Repeated measurements, random effects DIFFERENCE WITH GAM Easy to quantify uncertainty GEOSTATISTICAL DATA Triangulation guidelines. Results change with different triangulations Units. Same data. Degrees or meters Length triangels need to be smaller than the range (distance such that correlation between values is 0) In the triangulation add extension to avoid boundary effects where the variance increases near the border. Sometimes we want to respect borders BYM scale.model = TRUE Make precision parameters of models with different priors comparable neff parameters (independent parameters) Equivalente replicates: #data/neff parameters POINT PATTERN Stationary. Invariant to shifts (s1,..., sn) Z(s1),... Z(sn) Distribution Z(s1+h),... Z(sn+h) same in translation inla.doc("rw2d") Celcius degrees * 10 cellarea = 0.1 * 0.1 decimal degree = 0.01 --> --- class: inverse, center, middle # Areal data --- <div style="margin-top:-40px"></div> # Areal data <div style="margin-top:-40px"></div> <img src="./figures/georgiaLBW.png" width="40%" style="display: block; margin: auto;" /> <div style="margin-top:-10px"></div> **Standardized Mortality Ratio (SMR)** is often used to estimate disease risk <div style="margin-top:-10px"></div> `$$SMR \mbox{ in area } i = SMR_i = \frac{Y_i}{E_i} = \frac{\mbox{number observed cases in area } i}{\mbox{number expected cases in area } i}$$` `\(SMR_i = 1\)` `\(\rightarrow\)` same number of cases observed as expected `\(SMR_i > 1\)` `\(\rightarrow\)` more number of cases observed than expected (high risk) `\(SMR_i < 1\)` `\(\rightarrow\)` less number of cases cases observed than expected (low risk) **Example** `\(SMR_i = \frac{Y_i}{E_i} = \frac{200}{100} = 2 > 1\)` `\(\rightarrow\)` area `\(i\)` high risk (observed `\(>\)` expected) `\(SMR_i = \frac{Y_i}{E_i} = \frac{100}{200} = 0.5 < 1\)` `\(\rightarrow\)` area `\(i\)` low risk (observed `\(<\)` expected) --- <div style="margin-top:-20px"></div> ### Indirect standardization to calculate expected cases <div style="margin-top:-10px"></div> The expected number of cases in area `\(i\)` is the number of cases one would expect if the population in area `\(i\)` behaved the way the standard population behaves Standard population is considered as the whole population (all areas) Population is stratified by several factors (e.g., age and sex) `$$E_i = \sum_{j=1}^m r_j^{(std)} n_j^{(i)}$$` - `\(r_j^{(std)}=\frac{\mbox{number cases}}{\mbox{population}}\)` rate in stratum `\(j\)` in the standard population - `\(n_j^{(i)}\)`: population in stratum `\(j\)` of area `\(i\)` <div style="margin-top:10px"></div> ### Limitations <div style="margin-top:-10px"></div> SMRs may be misleading and unreliable in areas with small populations or rare diseases. Models enable to incorporate covariates and borrow information from neighboring areas to obtain smoothed relative risks --- <div style="margin-top:-40px"></div> # Areal models Model to estimate disease relative risk `\(\theta_i\)` in areas `\(i=1,\ldots,n\)` `$$Y_i|\theta_i \sim Poisson(E_i \times \theta_i)$$` `$$\log(\theta_i) = \boldsymbol{z}_i \boldsymbol{\beta} + u_i + v_i$$` - `\(Y_i\)`: number observed cases in area `\(i\)` - `\(E_i\)`: number expected cases in area `\(i\)` - `\(\theta_i\)`: relative risk in area `\(i\)` Fixed effects quantify the effects of the covariates on the disease risk - `\(\boldsymbol{z}_i = (1, z_{i1}, \ldots, z_{ip})\)` vector of the intercept and covariates `\(\beta = (\beta_0, \beta_1, \ldots, \beta_p)'\)` coefficient vector Random effects represent residual variation which is not explained by the available covariates - `\(u_i\)`: structured spatial effect to account for the spatial dependence between relative risks (areas that are close show more similar risk than areas that are not close) - `\(v_i\)`: unstructured spatial effect to account for independent noise --- <div style = 'margin-top: -40px; margin-bottom: -20px;'> # Tutorial ### Modeling Areal data (lung cancer risk in Pennsylvania, USA) http://www.paulamoraga.com/tutorial-areal-data <center> <img src="./figures/tutorialarealdata.png" style="width:70%;"/> </center> --- class: inverse, center, middle # Geostatistical data --- <div style="margin-top:-10px"></div> # Geostatistical data <center> <div style="margin-left:-70px; margin-right:-20px; width:120%;"> <img src="./figures/mbg.png" style="width:100%;"/> </div> </center> --- <div style="margin-top: -45px;"> # Geostatistical models <div style="margin-top: -20px;"> Models to predict prevalence at unsampled locations `$$Y_i|P(\boldsymbol{x}_i)\sim \mbox{Binomial} (n_i, P(\boldsymbol{x}_i))$$` `$$\mbox{logit}(P(\boldsymbol{x}_i)) = \boldsymbol{z}_i \boldsymbol{\beta} + S(\boldsymbol{x}_i) + u_i$$` `\(Y_i\)` number people positive, `\(n_i\)` number people tested, `\(P(x_i)\)` prevalence at `\(\boldsymbol{x}_i\)` <div style="margin-top: -10px;"> Covariates based on characteristics known to affect disease transmission (temperature, precipitation, vegetation, elevation, population density, etc.) <div style="margin-top: -10px;"> <!-- (temperature, precipitation, vegetation, elevation, distance to water bodies, urbanization, land cover, population density, etc) Random effects (Gaussian process with Matern covariate funcion, Independent risk) --> Random effects model residual variation not explained by covariates <center> <img src="./figures/covgrf.png" style="width:80%;"/> </center> --- <div style="margin-top:-30px;"></div> # Stochastic partial differential equation approach (SPDE) <!-- INLA uses a combination of analytical approximation and numerical integration to do approximate Bayesian inference in latent Gaussian models. which includes a large class of models ranging from generalized linear mixed to spatial and spatio-temporal models. --> In the SPDE approach, the continuously indexed Gaussian field `\(S\)` is represented as a discretely indexed Gaussian Markov random field (GMRF) by means of a finite basis function defined on a triangulation of the region of study <div style="margin-top:-10px;"></div> .pull-left[ `$$S(\mathbf{x})=\sum_{g=1}^G \psi_g(\mathbf{x}) S_g,$$` - `\(\psi_g(\cdot)\)` piecewise polynomial basis functions on each triangle - `\(\{S_g\}\)` zero-mean Gaussian distributed - `\(G\)` number of vertices in the triangulation ] .pull-right[ <img src="./figures/pointsandtriang.png" width="100%" style="display: block; margin: auto;" /> ] --- # Projection matrix `\(S(\mathbf{x})\)` weighted average of the values of the GMRF in the vertices of the triangle containing the location of the observation. <br> Weights = barycentric coordinates `$$S(\mathbf{x}) \approx \frac{A_{1}}{A}S_1 + \frac{A_{2}}{A}S_2 + \frac{A_{3}}{A}S_3$$` <img src="./figures/triangleweights.png" width="40%" style="display: block; margin: auto;" /> --- # Projection matrix `$$S(\mathbf{x}_i) \approx \sum_{g=1}^G A_{ig}S_g$$` where `\(A\)` is a projection matrix that maps the GMRF from the observations to the triangulation nodes Row `\(i\)` in `\(A\)` corresponding to observation `\(i\)` has possibly three non-zero values at columns that represent the vertices of the triangle containing the point (non-zero values are equal to the barycentric coordinates) `$$A = \left(\begin{array}{ccccc} A_{11} & A_{12} & A_{13} & \dots & A_{1G} \\ A_{21} & A_{22} & A_{23} & \dots & A_{2G} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ A_{n1} & A_{n2} & A_{n3} & \dots & A_{nG} \end{array}\right) = \left(\begin{array}{ccccc} 1 & 0 & 0 & \dots & 0\\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ .2 & .2 & 0 & \dots & .6 \\ \end{array}\right)$$` --- <div style = 'margin-top: -40px; margin-bottom: -20px;'></div> # Tutorial ### Modeling Geostatistical data (malaria prevalence in The Gambia) http://www.paulamoraga.com/tutorial-geostatistical-data <center> <img src="./figures/tutorialgeostatisticaldata.png" style="width:70%;"/> </center> <!-- #--- class: inverse, center, middle # Point patterns #--- # Point patterns <div style="margin-top:-100px"></div> <center> <div style="margin-left:70px; width:40%;"> <img src="./figures/pointdataKidney.png"/> </div> </center> Assume point pattern `\(\{s_i: i=1, \ldots, n\}\)` has been generated as a realization of a point process `\(Z = \{Z(s): s \in \mathbb{R}^2\}\)` A point process model can be used to identify patterns in the distribution of the observed locations, estimate the intensity of events, and learn about the correlation between the locations and spatial covariates #--- # Point processes Point processes are stochastic models that describe the locations of events of interest and possibly some additional information such as marks that inform about different types of events (e.g., cases and controls) Intensity of the process: mean number of events per unit area at location s `$$\lambda(s) = lim_{|ds|\rightarrow 0} \frac{E[N(ds)]}{|ds|}$$` <img src="pp.png" width="90%" style="display: block; margin: auto;" /> #--- <div style="margin-top:-20px"></div> # Homogeneous Poisson process <div style="margin-top:-20px"></div> The simplest point process model is the homogeneous Poisson process 1. the number of events in any region `\(A\)` follows a Poisson distribution with mean `\(\lambda |A|\)`, where `\(\lambda\)` is a constant value denoting the intensity and `\(|A|\)` is the area of `\(A\)` 2. given that there are `\(n\)` events inside `\(A\)`, the locations of these events are independent and uniformly distributed in `\(A\)` Thus, if a point pattern arises as a realization of an homogeneous Poisson process, an event is equally likely to occur at any location within the study region, regardless of the locations of other events <div style="margin-top:-20px"></div> <img src="index_files/figure-html/unnamed-chunk-18-1.png" width="40%" style="display: block; margin: auto;" /> #--- # Log-Gaussian Cox process A log-Gaussian Cox process (LGCP) is a Poisson process with a varying intensity which is itself a stochastic process of the form `$$\Lambda(s) = exp(\eta(s))$$` where `\(\eta = \{\eta(s): s \in \mathbb{R}^2\}\)` is a Gaussian process Conditional on `\(\Lambda(s)\)`, the point process is a Poisson process with intensity `\(\Lambda(s)\)` 1. the number of events in any region `\(A\)` is distributed as `\(Poisson(\int_A \Lambda(s)ds)\)` 2. the locations of events are an independent random sample with probability density proportional to `\(\Lambda(s)\)` Log-Gaussian Cox processes are typically used to model phenomena that are environmentally driven #--- <div style="margin-top:-40px"></div> # Fitting a log-Gaussian Cox process model <div style="margin-top:-20px"></div> Assume point pattern `\(\{s_i: i=1, \ldots, n\}\)` has been generated as a realization of a LGCP with intensity `\(\Lambda(s)= exp(\eta(s))\)` A LGCP can be fitted by discretizing the study region into a grid with `\(n_1 \times n_2\)` cells `\(\{s_{ij}: i=1,\ldots,n_1, j=1,\ldots,n_2\}\)`. `\(|s_{ij}|\)` area of cell `\(s_{ij}\)` <div style="margin-top:-30px"></div> <img src="index_files/figure-html/unnamed-chunk-19-1.png" width="30%" style="display: block; margin: auto;" /> <div style="margin-top:-30px"></div> Mean number of events in cell `\(s_{ij}\)`: `\(\Lambda_{ij}=\int_{s_{ij}} exp(\eta(s))ds \approx |s_{ij}| exp(\eta_{ij})\)` `$$y_{ij}|\eta_{ij} \sim Poisson(|s_{ij}| exp(\eta_{ij}))$$` `$$\eta_{ij} = \beta_0 + \beta_1 \times cov(s_{ij}) + f_s(s_{ij}) + f_u(s_{ij})$$` - `\(y_{ij}\)` observed number of locations in grid cell `\(s_{ij}\)` - `\(\beta_0\)` intercept, `\(cov(s_{ij})\)` covariate at `\(s_{ij}\)`, `\(\beta_1\)` coefficient of `\(cov(s_{ij})\)` - `\(f_s()\)` spatially structured random effect (CAR model on a regular lattice) - `\(f_u()\)` unstructured random effect #--- <div style = 'margin-top: -40px; margin-bottom: -20px;'> # Tutorial ### Modeling Point patterns (sloths occurrence in Costa Rica) http://www.paulamoraga.com/tutorial-point-patterns `spocc` package: https://docs.ropensci.org/spocc/ <center> <img src="./figures/sloth.jpg" style="width:35%;"/> </center> --> --- class: middle, inverse # Interactive visualizations and dashboards <br> to communicate results <br> ## HTML widgets ## R Markdown ## flexdashboard ## Shiny --- class: center, middle, inverse # HTML widgets --- # HTML widgets HTML widgets are interactive web visualizations built with JavaScript http://www.htmlwidgets.org/ http://www.htmlwidgets.org/showcase_leaflet.html --- <div style = 'margin-top: -40px; margin-bottom: -40px;'> # Leaflet http://rstudio.github.io/leaflet/ Basemaps: http://leaflet-extras.github.io/leaflet-providers/preview/index.html <!-- addTiles() https://rstudio.github.io/leaflet/basemaps.html Interactive map over time https://github.com/socib/Leaflet.TimeDimension#examples-and-basic-usage https://github.com/timelyportfolio/leaftime https://kepler.gl/demo/earthquakes https://kepler.gl/demo/map?mapUrl=https://dl.dropboxusercontent.com/s/cq7d5wmf7kfq3bl/keplergl_hu7426c.json https://github.com/SymbolixAU/mapdeck/issues/221 https://symbolixau.github.io/mapdeck/articles/mapdeck.html --> <img src="./figures/htmlwidgetsleaflet.png" width="80%" style="display: block; margin: auto;" /> --- # Dygraphs http://rstudio.github.io/dygraphs/ <img src="./figures/htmlwidgetsdygraphs.png" width="80%" style="display: block; margin: auto;" /> --- # DataTable http://rstudio.github.io/DT/ <img src="./figures/htmlwidgetsDT.png" width="80%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Reproducible documents with<br> R Markdown --- <div style="margin-top:-30px"></div> # R Markdown <div style="margin-top:-10px"></div> R Markdown can be used to turn our analysis into fully reproducible documents that can be shared with others. Output formats include HTML, PDF or Word. An R Markdown file is written with Markdown syntax with embedded R code, and can include narrative text, tables and visualizations https://rmarkdown.rstudio.com/ http://www.paulamoraga.com/book-geospatial/sec-rmarkdown.html <img src="./figures/rmarkdown.png" width="80%" style="display: block; margin: auto;" /> --- # YAML header ```r --- title: "An R Markdown document" author: "Paula Moraga" date: "1 July 2019" output: pdf_document --- ``` --- <div style="margin-top:-25px"></div> # Markdown syntax .pull-left[ `**bold text** *italic text*` **bold text** *italic text* ```rmarkdown - unordered item - unordered item 1. first item 2. second item 3. third item ``` - unordered item - unordered item 1. first item 2. second item 3. third item ] .pull-right[ ```rmarkdown # First-level header ## Second-level header ### Third-level header ``` # First-level header ## Second-level header ### Third-level header ] ```rmarkdown $$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$ ``` `$$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$` --- # R code chunks ````r ```{r, warning = FALSE} # R code to be executed ``` ```` Options: - `echo=FALSE` code will not be shown in the document, but it will run and the output will be displayed in the document - `eval=FALSE` code will not run, but it will be shown in the document - `include=FALSE` code will run, but neither the code nor the output will be included in the document - `results='hide'` output will not be shown, but the code will run and will be displayed in the document - `cache=TRUE` code chunk is not executed if it has been executed before and nothing in the code chunk has changed since then - `error=FALSE`, `warning=FALSE`, `message=FALSE` supress errors, warnings or messages --- background-image: url(figures/rmarkdown.png) background-size: contain background-position: left 70% <div style="margin-top:-30px"></div> # R Markdown --- class: center, middle, inverse # Interactive dashboards with<br> flexdashboard --- <div style="margin-top:-25px"></div> # Interactive dashboards with flexdashboard <div style="margin-top:-15px"></div> The R package **flexdashboard** uses R Markdown to publish a group of related data visualizations as a dashboard https://rmarkdown.rstudio.com/flexdashboard/ https://rmarkdown.rstudio.com/flexdashboard/examples.html http://www.paulamoraga.com/book-geospatial/sec-flexdashboard.html http://www.paulamoraga.com/book-geospatial/sec-dashboardswithshiny.html <img src="./figures/pm3.gif" width="90%" style="display: block; margin: auto;" /> --- # Layout <img src="./figures/flexdashboardlayout3.png" width="100%" style="display: block; margin: auto;" /> --- background-image: url(figures/pm3.gif) background-size: contain background-position: left 70% <div style="margin-top:-20px"></div> # Interactive dashboards with flexdashboard --- class: center, middle, inverse # Shiny web applications --- # Shiny web applications **Shiny** is a web application framework for R that enables to build interactive web applications https://shiny.rstudio.com/ http://www.paulamoraga.com/book-geospatial/sec-shiny.html Examples https://shiny.rstudio.com/gallery/single-file-shiny-app.html https://shiny.rstudio.com/gallery/telephones-by-region.html --- <div style="margin-top:-40px"></div> # SpatialEpiApp <div style="margin-top:-10px"></div> Shiny app for disease risk estimation, cluster detection, and interactive viz - Risk estimates by fitting Bayesian models with [INLA](http://www.r-inla.org/) - Detection of clusters by using the scan statistics in [SaTScan](https://www.satscan.org/) http://www.paulamoraga.com/software/ ```r library(devtools) install_github("Paula-Moraga/SpatialEpiApp") library(SpatialEpiApp) run_app() ``` <img src="./figures/animation.gif" width="100%" style="display: block; margin: auto;" /> --- <div style="margin-top:-40px"></div> # Structure of a Shiny App <div style="margin-top:-10px"></div> A Shiny app can be built by creating a directory (called, for example, `appdir`) that contains an R file (called, for example, `app.R`) with three components: - `ui` user interface object which controls layout and appearance of the app - `server()` function with instructions to build objects displayed in the `ui` - call to `shinyApp()` that creates the Shiny app from the `ui`/`server` pair ```r # define user interface object ui <- fluidPage( ) # define server() function server <- function(input, output){ } # call to shinyApp() which returns the Shiny app shinyApp(ui = ui, server = server) ``` Save `app.R` inside directory `appdir`. Launch the app: ```r library(shiny) runApp("appdir_path") ``` Directory `appdir` can also contain data or other R scripts needed by the app. We can also write two separate files `ui.R` and `server.R`. This permits an easier management of the code for large apps --- # Inputs <img src="./figures/basicwidgets.png" width="100%" style="display: block; margin: auto;" /> --- # Outputs Plots, tables, texts, images <img src="./figures/outputs.png" width="100%" style="display: block; margin: auto;" /> --- <div style="margin-top:-40px"></div> # Inputs, outputs and reactivity <div style="margin-top:-20px"></div> Inputs: we can interact with the app by modifying their values Outputs: objects we want to show in the app ```r ui <- fluidPage( *Input(inputId = myinput, label = mylabel, ...) *Output(outputId = myoutput, ...) ) server <- function(input, output){ output$myoutput <- render*({ # code to build the output. # If it uses an input value (input$myinput), # the output will be rebuilt whenever # the input value changes })} ``` We can include inputs by writing an `*Input()` function in the `ui`<br> (e.g., `textInput()`, `dateRangeInput()`, `fileInput()`) Outputs can be created by writing an `*Output()` function in the `ui`, and the R code to build the output inside a `render*()` function in `server()` (e.g., `textOutput()` and `renderText()`, `tableOutput()` and `renderTable()`) `fluidPage()` creates a display that automatically adjusts the app to the dimensions of the browser window --- # Inputs, outputs and reactivity https://shiny.rstudio.com/gallery/single-file-shiny-app.html <img src="./figures/exampleappR.png" width="100%" style="display: block; margin: auto;" /> --- # Sharing a Shiny app Option 1: Share R scripts with other users - users need R ```r library(shiny) runApp("appdir_path") ``` <div style="margin-top:60px"></div> Option 2: Host app as a web page at its own URL - users do not need R - app can be navigated through the internet with a web browser - host apps on own servers or using one of the ways RStudio offers such as [shinyapps.io](http://www.shinyapps.io/) and [Shiny Server](https://www.rstudio.com/products/shiny/shiny-server/) Example: https://paulamoraga.shinyapps.io/spatialepiapp/ --- <div style = 'margin-top: -40px; margin-bottom: -20px;'> # Tutorial ### Dashboards to communicate results <div style = 'margin-top: -20px; margin-bottom: -20px;'> Interactive dashboards with **flexdashboard** http://www.paulamoraga.com/tutorial-flexdashboard Example (lung cancer risk in Pennsylvania, USA) http://www.paulamoraga.com/tutorial-flexdashboard-example <img src="./figures/pm3.gif" width="70%" style="display: block; margin: auto;" /> <!-- #--- class: inverse, middle, center # Tutorials #--- # R packages for tutorials ```r install.packages(c("sp", "spdep", "raster", "rgdal", "leaflet", "ggplot2", "geoR", "dplyr", "SpatialEpi")) install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE) ``` #--- <div style = 'margin-top: -20px; margin-bottom: -20px;'> # Modeling <div style = 'margin-top: -20px; margin-bottom: -20px;'> Areal data (lung cancer risk in Pennsylvania, USA) http://www.paulamoraga.com/tutorial-areal-data Geostatistical data (malaria prevalence in The Gambia) http://www.paulamoraga.com/tutorial-geostatistical-data Point patterns (sloths occurrence in Costa Rica) http://www.paulamoraga.com/tutorial-point-patterns # Dashboards to communicate results <div style = 'margin-top: -20px; margin-bottom: -20px;'> Interactive dashboards with **flexdashboard** http://www.paulamoraga.com/book-geospatial/sec-flexdashboard.html http://www.paulamoraga.com/book-geospatial/sec-dashboardswithshiny.html Example (lung cancer risk in Pennsylvania, USA) http://www.paulamoraga.com/tutorial-flexdashboard-example --> <style> .pull-left-70 { float: left; width: 70%; } .pull-right-30 { float: right; width: 30%; } </style> --- background-image: url(./figures/Admissions-com-2020-COVID.png) background-size: contain background-position: top right <div style="margin-top:160px"></div> # Join my GeoHealth research group at KAUST! <div style="margin-top:-20px"></div> I am looking for outstanding PhD students and Postdocs to join my group <div style="margin-top:-10px"></div> 👩💻 Potential research areas include the development of innovative statistical methods and computational tools for health and environmental applications - Disease mapping, detection of clusters, early detection of outbreaks - Integration of spatial and spatio-temporal data - Development of R packages for data analysis and visualization 💪 Work closely with collaborators at KAUST and around the world ✈️ Generous travel funding to attend conferences and workshops ✨ Excellent research environment, free tuition, monthly living allowance, free housing, medical insurance, relocation support <!-- https://emoji.muan.co/#rais --> <div style="margin-top:10px"></div> KAUST http://kaust.edu.sa Statistics Program http://stat.kaust.edu.sa Admissions http://admissions.kaust.edu.sa http://studyat.kaust.edu.sa --- <div style="margin-top:-30px;"></div> # References <small> <div style="margin-top:-30px;"></div> .pull-left-70[ Moraga, P. (2019). *Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny*. Chapman & Hall/CRC Press Moraga, P., et al. (2019). epiflows: an R package for risk assessment of travel-related spread of disease. *F1000Research*, 7:1374 Moraga, P. (2018). Small Area Disease Risk Estimation and Visualization Using R. *The R Journal*, 10(1):495-506 Moraga, P. (2017). SpatialEpiApp: A Shiny Web Application for the analysis of Spatial and Spatio-Temporal Disease Data. *Spatial and Spatio-temporal Epidemiology*, 23:47-57 Moraga, P., et al. (2017). A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE. *Spatial Statistics*, 21:27-41 ] .pull-right-30[ <img src="./figures/bookcover.jpg" style="width:100%; padding-left:30px;"/> ] <div style="margin-bottom:-40px;"> </div> Moraga, P. and Kulldorff, M. (2016). Detection of spatial variations in temporal trends with a quadratic function. *Statistical Methods for Medical Research*, 25(4):1422-1437 Hagan, J. E., Moraga, P., et al. (2016). Spatio-temporal determinants of urban leptospirosis transmission: Four-year prospective cohort study of slum residents in Brazil. *PLOS Neglected Tropical Diseases*, 10(1): e0004275 Moraga, P., et al. (2015). Modelling the distribution and transmission intensity of lymphatic filariasis in sub-Saharan Africa prior to scaling up interventions: integrated use of geostatistical and mathematical modelling. *Parasites & Vectors*, 8:560 </small> --- class: inverse <div style = 'margin-top: 120px; margin-bottom: 40px;'> <span style = 'font-size: 68px; line-height:1.5; font-weight:bold'> Thanks!<br> </span> </div> <span style = 'font-size: 38px; font-weight:bold'> Paula Moraga<br> </span> <span style = 'font-size: 28px; line-height:1.5'> <a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i> @Paula\_Moraga\_</a><br> <a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i> www.PaulaMoraga.com</a><br> </span> <style> .pull-left-50 { float: left; width: 50%; } .pull-right-50 { float: right; width: 50%; } .pull-left-60 { float: left; width: 60%; } .pull-right-40 { float: right; width: 40%; } .pull-right-40-padding { float: right; width: 38%; padding-left: 10px } </style>