Assistant Professor of Statistics
King Abdullah University of Science
and Technology (KAUST), Saudi Arabia
Geospatial methods use disease, population and other data at several spatial and time resolutions to understand geographic and temporal patterns, identify risk factors, measure inequalities, and early detection of outbreaks. Results help inform the development of strategies for disease prevention and control
Disease mapping is important to understand geographic and temporal patterns of diseases and allocate resources where most needed
Often, maps given at an areal resolution which difficulties decision-making
Map shows malaria prevalence in Mozambique. However, disease risk varies continuously in space & areal data unable to show how risk varies within areas
Areal estimates make difficult targeting health interventions and directing resources where most needed
High-resolution estimates permit to find differences in disease risk within study regions, and identify areas and groups of people at higher risk
Model assumes there is a spatially continuous variable underlying all observations that can be modeled using a zero-mean Gaussian random field
\[\begin{equation*} \begin{aligned} Y(\mathbf{x}) & \sim \pi \left( \theta(\mathbf{x}), \tau \right), \quad \mathbf{x} \in A \subset \mathbb{R}^2, \\ \theta(\mathbf{x}_i) & = g^{-1}\left(\mu(\mathbf{x}_i)+S\left(\mathbf{x}_i\right) \right), \quad i=1, \ldots, n, \\ \theta(B_i) & =\left|B_i\right|^{-1} \int_{B_i} g^{-1}(\mu (\mathbf{x}) + S(\mathbf{x})) d \mathbf{x}, \quad i=n+1, \ldots, n+m, \end{aligned} \end{equation*}\]
Inference using INLA and a modification of the SPDE approach
Integrated nested Laplace approximations (INLA) is a computational approach to perform approximate Bayesian inference in latent Gaussian models
In the SPDE approach, the continuously indexed Gaussian random field \(S\) is represented as a discretely indexed Gaussian Markov random field (GMRF) by means of a finite basis function defined on a triangulation of the study region
\[S(\boldsymbol{x}) = \sum_{g=1}^G \psi_g(\boldsymbol{x}) S_g\]
\(\psi_g(\cdot)\) piecewise polynomial basis functions on each triangle
\(\{S_g \}\) zero-mean Gaussian distributed
\(G\) number of vertices in triangulation
\(S(\boldsymbol{x})\) weighted average of the GMRF values at the vertices of the triangle containing the point. Weights = barycentric coordinates
\[S(\boldsymbol{x}) \approx \frac{T_{1}}{T}S_1 + \frac{T_{2}}{T}S_2 + \frac{T_{3}}{T}S_3\]
\(T_1, T_2, T_3\) areas subtriangles formed by \(\boldsymbol{x}\) and vertices. \(T\) area whole triangle
\(S(B)=|B|^{-1} \int_{B} S(\boldsymbol{x})d\boldsymbol{x}\) weighted average of the GMRF values at the vertices of the triangles within the area. Weights = \(\mbox{ (number vertices)}^{-1}\)
\[S(B) \approx \frac{1}{m} \sum_{g \in B} S_g\]
\[S(\boldsymbol{x}_i) \approx \sum_{g=1}^G A_{ig} S_g\ \ \ \ \ \ \ \ \ \ \ S(B_j) \approx \sum_{g=1}^G A_{jg} S_g\] \(A\) projection matrix that maps GMRF from observations to triangulation nodes
Row \(i\) of point observation: possibly three non-zero values at columns of vertices of the triangle containing the point (= barycentric coordinates)
Row \(j\) of area: non-zero values in all the m vertices inside the area (= \(1/m\))
\[A = \begin{bmatrix} A_{11} & A_{12} & A_{13} & \dots & A_{1G} \\ A_{21} & A_{22} & A_{23} & \dots & A_{2G} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ A_{n1} & A_{n2} & A_{n3} & \dots & A_{nG} \end{bmatrix} = \begin{bmatrix} 0 & 0 & 1 & \dots & 0 \\ .2 & .2 & 0 & \dots & .6 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1/m & 1/m & 1/m & \dots & 0 \end{bmatrix}\]
\[\begin{aligned} Y(\mathbf{x}) & \sim \operatorname{Binomial}\left(N(\mathbf{x}), P(\mathbf{x})\right), \quad \mathbf{x} \in A \subset \mathbb{R}^2, \\ P(\mathbf{x_i}) & = \text{logit}^{-1}\left(\mu(\mathbf{x}_i)+S(\mathbf{x}_i)\right), \quad i=1, \ldots, n, \\ P(B_i) & = \left|B_i\right|^{-1} \int_{B_i} \text{logit}^{-1} \left(\mu (\mathbf{x}) + S(\mathbf{x}) \right) d \mathbf{x}, \quad i=n+1, \ldots, n+m. \end{aligned}\]
A disease cluster is an unusual aggregation of cases occurring together in a particular place and time. Detection of clusters is crucial to determine whether they are due to chance or specific environmental or occupational risk factors, allowing to allocate resources and respond more effectively to health threats
Disease cases are typically aggregated at areal resolution based on administrative boundaries, mainly for confidentiality reasons
Traditional cluster detection methods utilizing areal data often identify clusters comprising multiple areas, despite disease risk varying continuously in space
We propose a method to detect clusters of any shape indep. of boundaries
First obtain risk surfaces from a Bayesian spatial disaggregation model
\[\begin{equation*} \begin{aligned} Y(\mathbf{x}) & \sim \pi \left( \theta(\mathbf{x}), \tau \right), \quad \mathbf{x} \in A \subset \mathbb{R}^2, \\ \theta(\mathbf{x}_i) & = g^{-1}\left(\mu(\mathbf{x}_i)+S\left(\mathbf{x}_i\right) \right), \quad i=1, \ldots, n, \\ \theta(B_i) & =\left|B_i\right|^{-1} \int_{B_i} g^{-1}(\mu (\mathbf{x}) + S(\mathbf{x})) d \mathbf{x}, \quad i=n+1, \ldots, n+m, \end{aligned} \end{equation*}\]
Then use exceedance probabilities to identify high-risk locations
\[P(\theta(\mathbf{x}) > \mbox{threshold})\]
Through simulation, the disaggregation model showed high sensitivity and competitive specificity when compared to the circular scan statistic, flexible scan statistic, and exceedance probabilities from a Bayesian areal model
Simulation of a cluster with shape rectangle with a hole
Detecting clusters of lung cancer in Pennsylvania
Disease surveillance systems are critical to early detection of epidemics and the design of control strategies
Traditional surveillance systems rely on data gathered with a considerable delay and make surveillance systems ineffective for real-time surveillance
Real-time digital information may enable to detect outbreaks earlier
“Flu plus fever, not a good way to start the weekend”
“I’m so irritated at this cough and fever”
“This flu, fever & throat ache won’t let me sleep”
Dengue is a mosquito-borne disease that poses significant public health challenges in tropical and sub-tropical regions, including Brazil.
Many dengue cases only result in mild, flu-like illness, but some can be severe and even fatal.
Dengue does not have a specific treatment, but early detection and timely access to proper medical care significantly reduce the fatality rates associated with severe cases. Prevention focuses on personal protection and mosquito control.
Surveillance systems are crucial for dengue prevention and control.
Aedes aegypti
Vector control efforts
InfoDengue is a data collection and analysis system that generates indicators of dengue and other arboviruses in Brazil: https://info.dengue.mat.br/
In principle, dengue is meant to be reported within seven days of case identification. In practice,
Reported dengue cases in Rio de Janeiro, January 2011 to April 2012. Red line reported cases for those weeks.
Black line eventually reported cases after 10 weeks.
Assess the value Google Trends information to complement reported dengue cases for understanding dengue activity levels
Google Trends index for a specific keyword ranges from 0 to 100. Calculated using the number of searches for that keyword divided by the total number of searches of the region and time period to compare relative popularity
Weekly Google Trends index for keyword ‘dengue’ in Brazil, 2019 to 2024.
The objective of this work is to assess the usefulness of Google Trends data for weekly dengue nowcasting in each of the 27 Brazilian states.
We collect weekly reported dengue cases and Google Trends indices, fit several nowcasting models using different information, and compare nowcasts with the actual cases reported after 10 weeks. Models incorporate:
Performance evaluated using error measures and uncertainty intervals
Results vary by state. In general, Google Trends and InfoDengue are the best-performing approaches
Dengue-tracker provides weekly updates on the number of dengue cases per state in Brazil
We present official and corrected case counts incorporating information from Google Trends
Reports assist policymakers
and the general public in understanding dengue levels
and guide their decisions
Geospatial health problems deal with data that come from different sources and are available at several spatial and spatio-temporal resolutions
We have presented a flexible and fast model-based approach to combine data at different spatial resolutions. This model can be extended to address many problems of interest (including covariates, preferential sampling, spatio-temporal settings). It has applications in a wide range of disciplines where information at different spatial resolutions needs to be combined
The dengue study highlights the value of digital data in improving traditional surveillance systems. This is preliminary research, future studies will address search query biases, and utilize spatial models and covariates to obtain predictions at higher resolutions for more actionable insights.
Integrating health, climate, environmental, socio-economic, and digital data sources enhances traditional surveillance systems, leading to better decision-making and improved health and well-being of the population.
KAUST is an international university located on the shores of the Red Sea
All students receive a living allowance, free housing and medical coverage
👩💻 Potential research areas include the development of innovative statistical and computational methods for health and environmental applications
💪 Work closely with collaborators at KAUST and around the world
✈️ Generous travel funding for conferences and collaborative work
✨ Excellent research environment. Superb equipment and research facilities
Zhong, et al. (2024). Spatial data fusion adjusting for preferential sampling using integrated nested Laplace approximation and stochastic partial differential equation. Journal of the Royal Statistical Society Series A: Statistics in Society
Pavani, et al. (2023). Joint spatial modeling of the risks of co-circulating mosquito-borne diseases in Ceara, Brazil. Spatial and Spatio-temporal Epidemiology
Zhong and Moraga (2023). Bayesian Hierarchical Models for the Combination of Spatially Misaligned Data: A Comparison of Melding and Downscaler Approaches Using INLA and SPDE. Journal of Agricultural, Biological and Environmental Statistics, 29, 110–129
Ribeiro Amaral, et al. (2022). Spatio-temporal modeling of infectious diseases by integrating compartment and point process models. Stochastic Environmental Research and Risk Assessment, 37, 1519-1533
Moraga and Baker (2022). rspatialdata: a collection of data sources and tutorials on downloading and visualising spatial data using R. F1000Research, 11:77
Moraga. (2018). Small Area Disease Risk Estimation and Visualization Using R. The R Journal, 10(1):495-506
Moraga. (2017). SpatialEpiApp: A Shiny Web Application for the analysis of Spatial and Spatio-Temporal Disease Data. Spatial and Spatio-temporal Epidemiology, 23:47-57
Moraga, et al. (2017). A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE. Spatial Statistics, 21:27-41