Parameter-Free Spatial Data Mining Using MDL

Authors:
Spiros Papadimitriou;Aristides Gionis;Panayiotis Tsaparas;Risto A. Vaisanen;Heikki Mannila;Christos Faloutsos
Affiliations:
Carnegie Mellon University;University of Helsinki;University of Helsinki;University of Helsinki;University of Helsinki;Carnegie Mellon University
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 22
Cited 4

Elements of information theory

Elements of information theory
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Concept decompositions for large sparse text data using clustering

Machine Learning
Multilevel algorithms for multi-constraint graph partitioning

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields

Journal of the ACM (JACM)
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
K-Harmonic Means - A Spatial Clustering Algorithm with Boosting

TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
An Approach to Relate the Web Communities through Bipartite Graphs

WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Mining confident co-location rules without a support threshold

Proceedings of the 2003 ACM symposium on Applied computing
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast mining of spatial collocations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating Attraction in Spatial Point Patterns with an Application in the Field of Cultural History

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Arithmetic coding

IBM Journal of Research and Development
Spatially coherent clustering using graph cuts

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Compression-based data mining of sequential data

Data Mining and Knowledge Discovery
Pre-processing Large Spatial Data Sets with Bayesian Methods

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Hierarchical, Parameter-Free Community Discovery

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
A framework for regional association rule mining and scoping in spatial datasets

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature co-occurrence patterns, without any parameters. In particular, we employ the Minimum Description Length (MDL) principle coupled with a natural way of compressing regions. This defines what "good" means: a feature co-occurrence pattern is good, if it helps us better compress the set of locations for these features. Conversely, a spatial correlation is good, if it helps us better compress the set of features in the corresponding region. Our approach is scalable for large datasets (both number of locations and of features). We evaluate our method on both real and synthetic datasets.