<div class = "content"> <br> <center> <div style = 'margin-top: -80px; margin-bottom: -35px;'> <p class="text-center" style = 'font-size: 52px; line-height:1.5; font-weight:bold'>Disease Risk Modeling and <br> Visualization using R</p> </div> </center> <center> <img src="./figures/cover.png" style="width:60%; margin-bottom: -50px;"/> </center> <br> <div style = 'padding: 40px; padding-left: 40px; font-size: 32px; font-weight:bold; margin-bottom: -70px;'> Paula Moraga, Ph.D. </div> <div style = 'padding: 40px; padding-left: 40px; font-size: 26px; line-height:1.5; margin-bottom: 40px;'> <a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i> @Paula_Moraga_</a> <a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i> www.PaulaMoraga.com</a> <small> <a href='http://www.paulamoraga.com/presentation-geospatial-dataviz' target='_blank'><i class='fa fa-link fa-fw'></i> www.paulamoraga.com/presentation-geospatial-dataviz</a> </small> </div> </div> --- ## Geospatial modeling and visualization with R - Methods to analyze geospatial health data that enable to quantify disease burden, understand geographic and temporal patterns, identify risk factors, and measure inequalities <img src="./figures/typesspatialdata.png" style="width:100%;"/> - Maps and other visualizations that enable to represent disease risk and risk factors, and presentation options such as interactive dashboards to communicate results - Examples focus on geospatial health data but the methods are also useful in others fields that use georeferenced data including epidemiology, ecology, demography and criminology --- <div style="margin-top:-20px"></div> ## References <div style="margin-top:-10px"></div> Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny (2019, Chapman & Hall/ CRC Press) http://www.paulamoraga.com/book-geospatial/ <center> <img src="./figures/bookcover.jpg" style="margin-top:-5px; width:40%;"/> </center> --- <img src="figures/snow-cholera-map.jpg" width="90%" style="display: block; margin: auto;" /> --- background-image: url(figures/snow-cholera-map-pump-top-margin.png) background-size: contain <div style="margin-top:-20px"></div> # John Snow's map of cholera, London, 1854 --- # Geospatial methods for disease surveillance Geospatial methods can be used to - Understand geographic and temporal patterns - Highlight areas of high risk and detect clusters - Measure inequalities - Identify potential risk factors - Quantify the excess of disease risk close to a putative source - Early detection of outbreaks This information can guide decision makers and programme managers to better allocate limited resources and to design strategies for disease prevention and control **Data**: Methods use information about disease cases, individuals at risk, and potential risk factors such as demographic and environmental factors **Disease models** describe the variability in disease risk as a function of explanatory variables and random effects to account for unexplained variability. Models allow straightforward extensions to estimate covariate effects, and handle spatio-temporal data and multiple diseases --- background-image: url(figures/arealgeostatisticalpointpatterns.png) background-size: contain background-position: left 50% <div style="margin-bottom:-20px"></div> # Types of spatial data <div style="margin-bottom:500px"></div> [Moraga and Lawson, Computational Statistics & Data Analysis, 2012](https://doi.org/10.1016/j.csda.2011.11.011) [Moraga et al., Parasites & Vectors, 2015](https://doi.org/10.1186/s13071-015-1166-x) [Moraga and Montes, Statistics in Medicine, 2011](https://doi.org/10.1002/sim.4160) --- # Areal data <img src="./figures/georgiaLBW.png" width="30%" style="display: block; margin: auto;" /> Disease risk is often estimated by the **Standardized Mortality Ratio (SMR)** `$$SMR_i = \frac{Y_i}{E_i}$$` - `\(Y_i\)`: number observed cases in area `\(i\)` - `\(E_i\)`: number expected cases in area `\(i\)` (indirect standardization) If `\(SMR_i = 1\)`, same number of cases observed as expected If `\(SMR_i > 1\)`, more number of cases observed than expected If `\(SMR_i < 1\)`, less number of cases cases observed than expected --- # Expected cases Expected cases are calculated using indirect standardization Population is stratified by several factors (e.g., age and sex) Standard population is considered as the whole population (all areas) `\(E_i\)` is the expected number of cases in area `\(i\)` and represents the number of cases one would expect if the population in area `\(i\)` behaved the way the standard population behaves `$$E_i = \sum_{j=1}^m r_j^{(std)} n_j^{(i)}$$` - `\(r_j^{(std)}\)`: rate (number cases)/(population) in stratum `\(j\)` in the standard population - `\(n_j^{(i)}\)`: population in stratum `\(j\)` of area `\(i\)` --- <div style="margin-top:-30px"></div> # Standardized Mortality Ratio (SMR) `\(SMR\)` in area `\(i\)` `$$SMR_i = \frac{Y_i}{E_i}$$` `\(Y_i\)` number observed cases, `\(E_i\)` number expected cases in area `\(i\)` If `\(SMR_i = 1\)`, same number of cases observed as expected If `\(SMR_i > 1\)`, more number of cases observed than expected If `\(SMR_i < 1\)`, less number of cases cases observed than expected **Example** `\(SMR_i = \frac{Y_i}{E_i} = \frac{100}{200} = 0.5 < 1\)` `\(\rightarrow\)` area `\(i\)` low risk `\(SMR_i = \frac{Y_i}{E_i} = \frac{200}{100} = 2 > 1\)` `\(\rightarrow\)` area `\(i\)` high risk **Limitations** SMRs may be misleading and unreliable in areas with small populations or rare diseases. Models enable to incorporate covariates and borrow information from neighboring areas to obtain smoothed relative risks --- # Areal models Model to estimate disease relative risk `\(\theta_i\)` in areas `\(i=1,\ldots,n\)` `$$Y_i|\theta_i \sim Poisson(E_i \times \theta_i)$$` `$$\log(\theta_i) = \boldsymbol{z}_i \boldsymbol{\beta} + u_i + v_i$$` - `\(Y_i\)`: number observed cases in area `\(i\)` - `\(E_i\)`: number expected cases in area `\(i\)` - `\(\theta_i\)`: relative risk in area `\(i\)` Fixed effects quantify the effects of the covariates on the disease risk - `\(\boldsymbol{z}_i = (1, z_{i1}, \ldots, z_{ip})\)` vector of the intercept and covariates `\(\beta = (\beta_0, \beta_1, \ldots, \beta_p)'\)` coefficient vector Random effects represent residual variation which is not explained by the available covariates - `\(u_i\)`: structured spatial effect to account for the spatial dependence between relative risks (areas that are close show more similar risk than areas that are not close) - `\(v_i\)`: unstructured spatial effect to account for independent noise --- <div style="margin-top:-10px"></div> # Geostatistical data <center> <div style="margin-left:-70px; margin-right:-20px; width:120%;"> <img src="./figures/mbg.png" style="width:100%;"/> </div> </center> --- <div style="margin-top: -45px;"> # Geostatistical models <div style="margin-top: -20px;"> Models to predict prevalence at unsampled locations `$$Y_i|P(\boldsymbol{x}_i)\sim \mbox{Binomial} (n_i, P(\boldsymbol{x}_i))$$` `$$\mbox{logit}(P(\boldsymbol{x}_i)) = \boldsymbol{z}_i \boldsymbol{\beta} + S(\boldsymbol{x}_i) + u_i$$` `\(Y_i\)` number people positive, `\(n_i\)` number people tested, `\(P(x_i)\)` prevalence at `\(\boldsymbol{x}_i\)` <div style="margin-top: -10px;"> Covariates based on characteristics known to affect disease transmission (temperature, precipitation, vegetation, elevation, population density, etc.) <div style="margin-top: -10px;"> <!-- (temperature, precipitation, vegetation, elevation, distance to water bodies, urbanization, land cover, population density, etc) Random effects (Gaussian process with Matern covariate funcion, Independent risk) --> Random effects model residual variation not explained by covariates <center> <img src="./figures/covgrf.png" style="width:80%;"/> </center> --- # Point patterns <div style="margin-top:-100px"></div> <center> <div style="margin-left:70px; width:40%;"> <img src="./figures/pointdataKidney.png"/> </div> </center> <div style="margin-top:-20px"></div> Point processes are stochastic models that describe the locations of events of interest and possibly some additional information such as marks that inform about different types of events (e.g., cases and controls) Assume point pattern `\(\{s_i: i=1, \ldots, n\}\)` has been generated as a realization of a point process. A point process model can be used to identify patterns in the distribution of the observed locations, estimate the intensity of events, and learn about the correlation between the locations and spatial covariates --- # Coordinate Reference Systems (CRS) 1. **unprojected or geographic**: Latitude/Longitude for referencing location on the ellipsoid Earth. 2. **projected**: Easting/Northing for referencing location on 2-dimensional representation of Earth. Common projection: [Universal Transverse Mercator (UTM)](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system). <img src="./figures/geographic.png" width="34%" /> <img src="./figures/Utm-zones.jpg" width="64%" /> --- class: center, middle, inverse # Interactive visualizations and dashboards <br> to communicate results ### HTML widgets, R Markdown, flexdashboard, Shiny --- # HTML widgets HTML widgets are interactive web visualizations built with JavaScript http://www.htmlwidgets.org/ http://www.htmlwidgets.org/showcase_leaflet.html --- # Leaflet http://rstudio.github.io/leaflet/ <img src="./figures/htmlwidgetsleaflet.png" width="80%" style="display: block; margin: auto;" /> --- # Dygraphs http://rstudio.github.io/dygraphs/ <img src="./figures/htmlwidgetsdygraphs.png" width="80%" style="display: block; margin: auto;" /> --- # DataTable http://rstudio.github.io/DT/ <img src="./figures/htmlwidgetsDT.png" width="80%" style="display: block; margin: auto;" /> --- <div style="margin-top:-30px"></div> # R Markdown <div style="margin-top:-10px"></div> R Markdown can be used to turn our analysis into fully reproducible documents that can be shared with others. Output formats include HTML, PDF or Word. An R Markdown file is written with Markdown syntax with embedded R code, and can include narrative text, tables and visualizations https://rmarkdown.rstudio.com/ http://www.paulamoraga.com/book-geospatial/sec-rmarkdown.html <img src="./figures/rmarkdown.png" width="80%" style="display: block; margin: auto;" /> --- # YAML header ```r --- title: "An R Markdown document" author: "Paula Moraga" date: "1 July 2019" output: pdf_document --- ``` --- <div style="margin-top:-25px"></div> # Markdown syntax .pull-left[ `**bold text** *italic text*` **bold text** *italic text* ```rmarkdown - unordered item - unordered item 1. first item 2. second item 3. third item ``` - unordered item - unordered item 1. first item 2. second item 3. third item ] .pull-right[ ```rmarkdown # First-level header ## Second-level header ### Third-level header ``` # First-level header ## Second-level header ### Third-level header ] ```rmarkdown $$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$ ``` `$$\int_0^\infty e^{-x^2} dx=\frac{\sqrt{\pi}}{2}$$` --- # R code chunks ````r ```{r, warning = FALSE} # R code to be executed ``` ```` Options: - `echo=FALSE` code will not be shown in the document, but it will run and the output will be displayed in the document - `eval=FALSE` code will not run, but it will be shown in the document - `include=FALSE` code will run, but neither the code nor the output will be included in the document - `results='hide'` output will not be shown, but the code will run and will be displayed in the document - `cache=TRUE` code chunk is not executed if it has been executed before and nothing in the code chunk has changed since then - `error=FALSE`, `warning=FALSE`, `message=FALSE` supress errors, warnings or messages --- background-image: url(figures/rmarkdown.png) background-size: contain background-position: left 70% <div style="margin-top:-30px"></div> # R Markdown --- <div style="margin-top:-25px"></div> # Interactive dashboards with flexdashboard <div style="margin-top:-15px"></div> The R package **flexdashboard** uses R Markdown to publish a group of related data visualizations as a dashboard https://rmarkdown.rstudio.com/flexdashboard/ https://rmarkdown.rstudio.com/flexdashboard/examples.html http://www.paulamoraga.com/book-geospatial/sec-dashboardswithshiny.html <img src="./figures/pm3.gif" width="90%" style="display: block; margin: auto;" /> --- # Layout <img src="./figures/flexdashboardlayout3.png" width="100%" style="display: block; margin: auto;" /> --- background-image: url(figures/pm3.gif) background-size: contain background-position: left 70% <div style="margin-top:-20px"></div> # Interactive dashboards with flexdashboard --- # Shiny web applications **Shiny** is a web application framework for R that enables to build interactive web applications https://shiny.rstudio.com/ http://www.paulamoraga.com/book-geospatial/sec-shiny.html Examples https://shiny.rstudio.com/gallery/single-file-shiny-app.html https://shiny.rstudio.com/gallery/telephones-by-region.html --- <div style="margin-top:-40px"></div> # SpatialEpiApp <div style="margin-top:-10px"></div> Shiny app for interactive viz, disease risk estimation and cluster detection - Risk estimates by fitting Bayesian models with [INLA](http://www.r-inla.org/) - Detection of clusters by using the scan statistics in [SaTScan](https://www.satscan.org/) http://www.paulamoraga.com/software/ ```r library(devtools) install_github("Paula-Moraga/SpatialEpiApp") library(SpatialEpiApp) run_app() ``` <img src="./figures/animation.gif" width="100%" style="display: block; margin: auto;" /> --- # Structure of a Shiny App A Shiny app is a directory that contains an R file called `app.R` `app.R` has three components: - `ui` user interface object which controls the layout and appearance of the app - `server()` function with the instructions to build the objects displayed in the `ui` - call to `shinyApp()` that creates the Shiny app from the `ui`/`server` pair --- # Content app.R ```r # define user interface object ui <- fluidPage( ) # define server() function server <- function(input, output){ } # call to shinyApp() which returns the Shiny app shinyApp(ui = ui, server = server) ``` Save `app.R` inside the `appdir` directory. Launch the app: ```r library(shiny) runApp("appdir_path") ``` --- # Inputs <img src="./figures/basicwidgets.png" width="100%" style="display: block; margin: auto;" /> --- # Outputs Plots, tables, texts, images <img src="./figures/outputs.png" width="100%" style="display: block; margin: auto;" /> --- # Inputs, outputs and reactivity Inputs: we can interact with the app by modifying their values Outputs: objects we want to show in the app ```r ui <- fluidPage( *Input(inputId = myinput, label = mylabel, ...) *Output(outputId = myoutput, ...) ) server <- function(input, output){ output$myoutput <- render*({ # code to build the output. # If it uses an input value (input$myinput), # the output will be rebuilt whenever # the input value changes })} ``` --- # Inputs, outputs and reactivity https://shiny.rstudio.com/gallery/single-file-shiny-app.html <img src="./figures/exampleappR.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # Tutorials --- # R packages for tutorials ```r install.packages(c("sp", "spdep", "raster", "rgdal", "leaflet", "ggplot2", "geoR", "dplyr", "SpatialEpi")) install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE) ``` --- # Tutorials ## Introduction http://www.paulamoraga.com/presentation-geospatial-dataviz ## Modeling Areal data (lung cancer risk in Pennsylvania, USA) http://www.paulamoraga.com/tutorial-areal-data Geostatistical data (malaria in The Gambia) http://www.paulamoraga.com/tutorial-geostatistical-data --- # Tutorials ## Creation of dashboards to communicate results R Markdown http://www.paulamoraga.com/book-geospatial/sec-rmarkdown.html Interactive dashboards with **flexdashboard** http://www.paulamoraga.com/book-geospatial/sec-flexdashboard.html Example (lung cancer risk in Pennsylvania, USA) http://www.paulamoraga.com/tutorial-flexdashboard-example --- <div style="margin-top:-40px"></div> # Join my Geospatial Health Surveillance research group at KAUST! <div style="margin-top:-20px"></div> I am hiring excellent Postdocs, PhD and Master's students to join my group <div style="margin-top:-10px"></div> Development of innovative statistical methods and computational tools for disease surveillance that alert public health officials when elevated disease levels are anticipated and provide insights about disease drivers <div style="margin-top:-10px"></div> Collaboration with colleagues in Europe, America and Asia on projects to prevent and control diseases such as COVID-19, dengue and malaria <center> <iframe width="560" height="215" src="https://studyat.kaust.edu.sa/" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </center> <div style="margin-top:-10px"></div> Excellent research environment, free tuition, monthly living allowance, housing, medical and dental coverage, relocation support <div style="margin-top:-10px"></div> KAUST http://kaust.edu.sa Statistics Program http://stat.kaust.edu.sa Admissions http://admissions.kaust.edu.sa http://studyat.kaust.edu.sa --- class: inverse <div style = 'margin-top: 120px; margin-bottom: 40px;'> <span style = 'font-size: 68px; line-height:1.5; font-weight:bold'> Thanks!<br> </span> </div> <span style = 'font-size: 38px; font-weight:bold'> Paula Moraga<br> </span> <span style = 'font-size: 28px; line-height:1.5'> <a href='mailto:pm865@bath.ac.uk' target='_blank'><i class='fa fa-envelope-square fa-fw'></i> pm865@bath.ac.uk</a><br> <a href='http://twitter.com/Paula_Moraga_' target='_blank'> <i class='fa fa-twitter fa-fw'></i> @Paula\_Moraga\_</a><br> <a href='https://Paula-Moraga.github.io/' target='_blank'><i class='fa fa-globe fa-fw'></i> www.PaulaMoraga.com</a><br> </span>