Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description

Authors:
Byron J. Gao;Martin Ester
Affiliations:
Simon Fraser University, Canada;Simon Fraser University, Canada
Venue:
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Year:
2006

Citing 0
Cited 6

The minimum consistent subset cover problem and its applications in data mining

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
CHIRP: a new classifier based on composite hypercubes on iterated random projections

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Substantial improvements in the set-covering projection classifier CHIRP (composite hypercubes on iterated random projections)

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Yet not all data mining methods produce such readily understandable knowledge, e.g., most clustering algorithms output sets of points as clusters. In this paper, we perform a systematic study of cluster description that generates interpretable patterns from clusters. We introduce and analyze novel description formats leading to more expressive power, motivate and define novel description problems specifying different trade-offs between interpretability and accuracy. We also present effective heuristic algorithms together with their empirical evaluations.