Rapid detection of significant spatial clusters

Authors:
Daniel B. Neill;Andrew W. Moore
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 6
Cited 19

Computational geometry: an introduction

Computational geometry: an introduction
The design and analysis of spatial data structures

The design and analysis of spatial data structures
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Rapid detection of significant spatial clusters

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Multiresolution instance-based learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Rapid detection of significant spatial clusters

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Detection of emerging space-time clusters

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
The hunting of the bump: on maximizing statistical discrepancy

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Spatial scan statistics: approximations and performance study

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Conditional Anomaly Detection

IEEE Transactions on Knowledge and Data Engineering
Statistical change detection for multi-dimensional data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Augmented Privacy with Virtual Humans

Digital Human Modeling
Guessing the extreme values in a data set: a Bayesian method and its applications

The VLDB Journal — The International Journal on Very Large Data Bases
A LRT framework for fast spatial anomaly detection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Virtual Human Imaging

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Interactive Visualization of Network Anomalous Events

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Monitoring food safety by detecting patterns in consumer complaints

IAAI'06 Proceedings of the 18th conference on Innovative applications of artificial intelligence - Volume 2
Efficiently mining regional outliers in spatial data

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
A Model-Agnostic Framework for Fast Spatial Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Regional behavior change detection via local spatial scan

Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
Spatially regularized logistic regression for disease mapping on large moving populations

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
SigSpot: mining significant anomalous regions from time-evolving networks (abstract only)

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A fresh perspective: learning to sparsify for detection in massive noisy sensor networks

Proceedings of the 12th international conference on Information processing in sensor networks
Effective detection of sophisticated online banking fraud on extremely imbalanced data

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given an N x N grid of squares, where each square has a count cij and an underlying population pij, our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region's total count C and total population P, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N4) time, and is generally computationally infeasible. We present a multiresolution algorithm which partitions the grid into overlapping regions using a novel overlap-kd tree data structure, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in O((N log N)2) time, in practice resulting in significant (20-2000x) speedups on both real and simulated datasets.