11 Areal data issues

Spatial analyses of aggregated data may be subject to the Misaligned Data Problem (MIDP) which refers to a situation where the spatial data being analyzed are at a different scale than the one at which they were originally collected (Banerjee, Carlin, and Gelfand 2004). For example, individual observations or small areas data may be aggregated to larger areas due to several reasons such as confidentiality or to match the scale of other data sources. The aggregation of the data can lead to a loss of spatial information that can hide spatial patterns or relationships that exist at the finer scale, potentially leading to erroneous conclusions or misinterpretation of the results.

The Modifiable Areal Unit Problem (MAUP) (Openshaw 1984) refers to the issue of how the results of spatial analyses may change if one aggregates the same underlying data to a different level of spatial aggregation. The MAUP consists of two interrelated effects, namely, the scale and zoning effects. The MAUP’s scale effect occurs when the results of an analysis change because the geographic units used for analysis are aggregated or disaggregated. For example, if a study examines crime rates across different neighborhoods, the patterns observed may differ depending on whether the analysis is conducted at the level of city blocks, census tracts, or larger administrative zones. The zoning effect of the MAUP arises when the results of an analysis are impacted by the arbitrary creation of geographic units. For example, when examining income levels across various districts, the specific boundaries assigned to each district can influence the results. Thus, different configurations of boundaries can produce different spatial patterns, leading to variations in the outcomes of the analysis.

Ecological studies, also known as population-level studies, investigate the relationships between exposure factors or interventions and health outcomes at the group or population level (Robinson 1950). Instead of focusing on individual-level data, ecological studies analyze aggregated data for groups or populations, such as cities, regions, or countries. Ecological studies are useful when individual-level data is not available or difficult to collect. However, ecological studies face the ecological fallacy, where associations observed at the group level may not hold true for individuals within those groups. The resulting bias, known as ecological bias, can be viewed as a special case of the MAUP, as it encompasses two effects similar to the aggregation and zoning effects in the MAUP (Gotway and Young 2002). These effects are the aggregation bias, which arises from grouping individuals together, and the specification bias, resulting from the uneven distribution of confounding variables resulting from grouping.

Measurements of a spatial phenomenon can be obtained at various spatial resolutions and from diverse sources. For instance, air pollution measurements can be gathered from monitoring stations located at specific locations, as well as through satellite-derived measurements that provide aggregated information in areas. The integration of these data can lead to more accurate air pollution predictions at finer spatial resolutions than the ones obtained using just one type of data. Moraga et al. (2017) proposes a Bayesian melding model to combine spatially misaligned data that assumes a common spatially continuous Gaussian random field underlying all observations, and uses the integrated nested Laplace approximation (INLA) and stochastic partial differential equation (SPDE) approaches for fast inference. Zhong and Moraga (2023) compare the Bayesian melding model with a Bayesian downscaler approach that integrates point- and area-level data by considering a model with spatially varying coefficients that has point data as response and areal data as covariates. They also use air pollution data to show how the melding model can be used to disaggregate areal data and produce spatially continuous predictions, as well as predictions at certain spatial resolutions that are policy relevant improving decision-making.