CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
ACM Computing Surveys (CSUR)
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering
Proceedings of the tenth international conference on Information and knowledge management
Cluster validity methods: part I
ACM SIGMOD Record
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Finding Localized Associations in Market Basket Data
IEEE Transactions on Knowledge and Data Engineering
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Massive Quasi-Clique Detection
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
VISTA: validating and refining clusters via visualization
Information Visualization
The "Best K" for entropy-based categorical data clustering
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Clustering categorical data using coverage density
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Determining the best K for clustering transactional datasets: A coverage density-based approach
Data & Knowledge Engineering
SCALE: a scalable framework for efficiently clustering transactional data
Data Mining and Knowledge Discovery
Discovering Knowledge-Sharing Communities in Question-Answering Forums
ACM Transactions on Knowledge Discovery from Data (TKDD)
A practical approach for clustering transaction data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
Clustering of heterogeneously typed data with soft computing - a case study
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Hi-index | 0.00 |
It is widely recognized that developing efficient and fully automated algorithms for clustering large transactional datasets is a challenging problem. In this paper, we propose a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Our approach has three unique features. First, we use the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop two transactional data clustering specific evaluation metrics based on the concept of large transactional items and the coverage density respectively. Third, we implement the weighted coverage density clustering algorithm and the two clustering validation metrics using a fully automated transactional clustering framework, called SCALE (Sampling, Clustering structure Assessment, cLustering and domain-specific Evaluation). The SCALE framework is designed to combine the weighted coverage density measure for clustering over a sample dataset with self-configuring methods that can automatically tune the two important parameters of the clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set of clustering results. We have conducted experimental evaluation using both synthetic and real datasets and our results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality clustering results in a fully automated manner.