Health data provides information to identify public health problems and respond appropriately when they occur. This information is crucial to prevent and control a variety of health conditions such as infectious diseases, non-communicable diseases, injuries, and health-related behaviors. The process of analysis and interpretation of health data encompasses a broad variety of system designs, analytic methods, modes of presentation, and interpretive uses (Lee et al. 2010). In general, descriptive methods are the basis of routine reporting of surveillance data. These focus on the observed patterns in the data and might also seek to compare the relative occurrence of health outcomes in different subgroups. More specialized hypotheses are explored using inferential methods. The aim of these methods is to make statistical conclusions about the patterns or outcomes of health.
The increased availability of georeferenced health information, population data, satellite imagery of environmental factors that influence disease activity levels, and the development of geographic information systems (GIS) and software for geocoding addresses, have facilitated the ascent of the investigations of the spatial and spatio-temporal variations of disease. John Snow’s cholera-outbreak investigation in London in 1854 provides one of the most famous examples of spatial analysis. Snow used a map to illustrate how cholera deaths appeared to be clustered around a public water pump. The assessment of the spatial pattern of the cholera cases was important in identifying the source of the infection and gave support to the theory of cholera transmission through drinking water (Snow 1857).
There is a wide range of spatial and spatio-temporal methods for disease surveillance including methods for disease mapping, clustering, and geographic correlation studies. Many of these methods may be used to highlight areas of high risk (Moraga and Lawson 2012), identify risk factors (Hagan et al. 2016), assess spatial variations in temporal trends (Moraga and Kulldorff 2016), quantify the excess of disease risk close to a putative source (Wakefield and Morris 2001), and early detection of outbreaks (Polonsky et al. 2019; Moraga et al. 2019).
The mapping of disease risk has a long history in public health surveillance. Disease maps provide a rapid visual summary of spatial information and allow the identification of patterns that may be missed in tabular presentations (Elliott and Wartenberg 2004). Such maps are crucial for describing the spatial and temporal variation of the disease, identifying areas of unusually high risk, formulating etiological hypotheses, measuring inequalities, and allowing better resource allocation.
Disease risk estimates are based on information of the observed disease cases, the number of individuals at risk, and possibly, also covariate information such as demographic and environmental factors. Bayesian hierarchical models are used to describe the variability in the response variable as a function of risk factor covariates and random effects that account for unexplained variation. The use of Bayesian modeling provides a flexible and robust approach that permits to take into account the effects of explanatory variables and accomodate spatial and spatio-temporal correlation, and provides a formal expression of uncertainty in the risk estimates (Moraga 2018). Bayesian inference can be implemented via Markov chain Monte Carlo (MCMC) methods or by using integrated nested Laplace approximation (INLA) which is a computationally effective alternative to MCMC designed for latent Gaussian models (Lindgren and Rue 2015).
Health data are often obtained by aggregating point data over subareas of the study region such as counties or provinces due to several reasons such as patient confidentiality. Often, disease risk models aim to obtain low variance estimates of disease risk within the same areas where data are available. One limitation of this approach is that disease risk maps obtained at this resolution are unable to show how risk varies within areas which difficulties targeting health interventions and directing resources where they are most needed. A better approach is to use point data and build models that exploit correlation between nearby data points and include high spatial resolution covariates to produce disease risk estimates in a continuous surface (Moraga et al. 2017; Diggle et al. 2013). Maps obtained with this type of models offer high spatial resolution estimates with which to more precisely implement public health programs where they can have the greatest impact.
It is important to note that the goal of health surveillance is not merely to collect data for analysis, but to guide public health policy and action to control and prevent diseases. A key aspect of surveillance practice is, therefore, the proper and timely dissemination of information to those responsible for disease prevention and control. Depending on the circumstances, those should include health agencies, governments, private organizations, potentially exposed individuals, and innumerable others.
The R software provides excellent tools that greatly facilitate effective communication with collaborators, decision makers, and the general public, and these should be used consistently and thoughtfully to respond quickly to population’s health needs. R offers visualization packages such as leaflet (Cheng, Karambelkar, and Xie 2021) for making interactive maps, dygraphs (Vanderkam et al. 2018) for plotting time series, and DT (Xie, Cheng, and Tan 2021) for displaying data tables. Moreover, findings can be easily included in reproducible reports generated with R Markdown (Allaire et al. 2021), interactive dashboards using flexdashboard (Iannone, Allaire, and Borges 2020), and interactive web applications built with Shiny (Chang et al. 2021). These tools provide important information on which to base action and a careful interpretation of them allows public health officers to allocate resources efficiently and target populations for education or preventive programs.