Constrained spanning tree algorithms for irregularly-shaped spatial clustering

  • Authors:
  • Marcelo Azevedo Costa;Renato Martins Assunção;Martin Kulldorff

  • Affiliations:
  • Department of Statistics, Universidade Federal de Minas Gerais, Brazil;Department of Statistics, Universidade Federal de Minas Gerais, Brazil;Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, United States

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.03

Visualization

Abstract

Spatial clustering methodologies that are capable of detecting and delineating irregular clusters can provide information about the geographical spread of various diseases under surveillance. This paper proposes and compares three spatial scan statistics designed to detect clusters with irregular shapes. The proposed methods use geographical boundary information to construct a graph in which a cluster growing process is performed based on likelihood function maximization. Constraints on cluster shape are imposed through early stopping, a double connection requirement and a maximum linkage criteria. The methods are evaluated using simulated data sets with either circular or irregular clusters, and compared to the circular and elliptic scan statistics. Results show that for circular clusters, the standard circular scan statistic is optimal, as expected. The double connection, elliptic maximum linkage scan statistics also achieve good results. For irregularly-shaped clusters, the elliptic, maximum linkage and double connected scan statistics are optimal for different cluster models and by different evaluation criteria, but the circular scan statistic also performs well. If the emphasis is on statistical power for cluster detection, the simple circular scan statistic is attractive across the board choice. If the emphasis is on the accurate determination of cluster size, shape and boundaries, the double connected, maximum linkage and elliptical scan statistics are often more suitable choices. All methods perform well though, with the exception of the unrestricted dynamic minimum spanning tree scan statistic and the early stopping scan statistic, which we do not recommend.