SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Bump hunting in high-dimensional data
Statistics and Computing
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Rapid detection of significant spatial clusters
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On detecting space-time clusters
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient calculation of interval scores for DNA copy number data analysis
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Spatial scan statistics: approximations and performance study
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical change detection for multi-dimensional data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Region-restricted clustering for geographic data mining
Computational Geometry: Theory and Applications
Guessing the extreme values in a data set: a Bayesian method and its applications
The VLDB Journal — The International Journal on Very Large Data Bases
On burstiness-aware search for document sequences
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A LRT framework for fast spatial anomaly detection
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently mining regional outliers in spatial data
SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
A Model-Agnostic Framework for Fast Spatial Anomaly Detection
ACM Transactions on Knowledge Discovery from Data (TKDD)
Identifying, attributing and describing spatial bursts
Proceedings of the VLDB Endowment
Spatio-temporal outlier detection in precipitation data
Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
Hi-index | 0.00 |
Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy.In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes.We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in O(1/ε n2 log2n) that computes the maximum discrepancy rectangle to within additive error ε, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time O(n4).