Efficiently clustering transactional data with weighted coverage density

Authors:
Hua Yan;Keke Chen;Ling Liu
Affiliations:
University of Electronic Science & Tech. of China, Chengdu, P.R. ChinaUniversity of Electronic Science & Tech. of China, Chengdu, P.R. China;Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 20
Cited 6

CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
Cluster validity methods: part I

ACM SIGMOD Record
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Finding Localized Associations in Market Basket Data

IEEE Transactions on Knowledge and Data Engineering
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Massive Quasi-Clique Detection

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Entropy-based criterion in categorical clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
VISTA: validating and refining clusters via visualization

Information Visualization
The "Best K" for entropy-based categorical data clustering

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Clustering categorical data using coverage density

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Determining the best K for clustering transactional datasets: A coverage density-based approach

Data & Knowledge Engineering
SCALE: a scalable framework for efficiently clustering transactional data

Data Mining and Knowledge Discovery
Discovering Knowledge-Sharing Communities in Question-Answering Forums

ACM Transactions on Knowledge Discovery from Data (TKDD)
A practical approach for clustering transaction data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
DHCC: Divisive hierarchical clustering of categorical data

Data Mining and Knowledge Discovery
Clustering of heterogeneously typed data with soft computing - a case study

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely recognized that developing efficient and fully automated algorithms for clustering large transactional datasets is a challenging problem. In this paper, we propose a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Our approach has three unique features. First, we use the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop two transactional data clustering specific evaluation metrics based on the concept of large transactional items and the coverage density respectively. Third, we implement the weighted coverage density clustering algorithm and the two clustering validation metrics using a fully automated transactional clustering framework, called SCALE (Sampling, Clustering structure Assessment, cLustering and domain-specific Evaluation). The SCALE framework is designed to combine the weighted coverage density measure for clustering over a sample dataset with self-configuring methods that can automatically tune the two important parameters of the clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set of clustering results. We have conducted experimental evaluation using both synthetic and real datasets and our results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality clustering results in a fully automated manner.