BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal partial-match retrieval when fields are independently specified
ACM Transactions on Database Systems (TODS)
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
Approximating a collection of frequent sets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Subspace Clustering of High Dimensional Data
Data Mining and Knowledge Discovery
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Given a table where rows correspond to records and columns correspond to attributes, we want to find a small number of patterns that succinctly summarize the dataset. For example, given a set of patient records with several attributes each, how can we find (a) that the "most representative" pattern is, say, (male, adult, *), followed by (*, child, low-cholesterol), etc? We propose TSum, a method that provides a sequence of patterns ordered by their "representativeness." It can decide both which these patterns are, as well as how many are necessary to properly summarize the data. Our main contribution is formulating a general framework, TSum, using compression principles. TSum can easily accommodate different optimization strategies for selecting and refining patterns. The discovered patterns can be used to both represent the data efficiently, as well as interpret it quickly. Extensive experiments demonstrate the effectiveness and intuitiveness of our discovered patterns.