Adherence clustering: an efficient method for mining market-basket clusters

Authors:
Ching-Huang Yun;Kun-Ta Chuang;Ming-Syan Chen
Affiliations:
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC;Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC;Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC
Venue:
Information Systems
Year:
2006

Citing 39
Cited 1

How many clusters are best?—an experiment

Pattern Recognition
Algorithms for clustering data

Algorithms for clustering data
C4.5: programs for machine learning

C4.5: programs for machine learning
A conceptual version of the K-means algorithm

Pattern Recognition Letters
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Data mining: concepts and techniques

Data mining: concepts and techniques
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Modern Information Retrieval

Modern Information Retrieval
Introduction to the Theory of Neural Computation

Introduction to the Theory of Neural Computation
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
FREM: fast and robust EM clustering for large data sets

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Experiments with Incremental Concept Formation: UNIMEM

Machine Learning
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
On Data Clustering Analysis: Scalability, Constraints, and Validation

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Clustering Large Categorical Data

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Hierarchical model-based clustering of large datasets through fractionation and refractionation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A robust and efficient clustering algorithm based on cohesion self-merging

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Relation discovery from web data for competency management

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore in this paper the efficient clustering of market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality and sparsity. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm k-todes, for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality as measured by information gain, indicating the usefulness of category-based adherence in market-basket data clustering.