CLOPE: a fast and effective clustering algorithm for transactional data

Authors:
Yiling Yang;Xudong Guan;Jinyuan You
Affiliations:
Shanghai Jiao Tong University, Shanghai, P.R.China;Shanghai Jiao Tong University, Shanghai, P.R.China;Shanghai Jiao Tong University, Shanghai, P.R.China
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 8
Cited 34

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions

IEEE Transactions on Knowledge and Data Engineering
TCSOM: Clustering Transactions Using Self-Organizing Map

Neural Processing Letters
Adherence clustering: an efficient method for mining market-basket clusters

Information Systems
CLUC: a natural clustering algorithm for categorical datasets based on cohesion

Proceedings of the 2006 ACM symposium on Applied computing
Projected clustering for categorical datasets

Pattern Recognition Letters
Efficiently clustering transactional data with weighted coverage density

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Enhancing the Effectiveness of Clustering with Spectra Analysis

IEEE Transactions on Knowledge and Data Engineering
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

IEEE Transactions on Knowledge and Data Engineering
k-ANMI: A mutual information based clustering algorithm for categorical data

Information Fusion
Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

Knowledge and Information Systems
Determining the best K for clustering transactional datasets: A coverage density-based approach

Data & Knowledge Engineering
Computational accounting in determining Chart of Accounts using nominal data analysis and concept of entropy

Expert Systems with Applications: An International Journal
Efficient layered density-based clustering of categorical data

Journal of Biomedical Informatics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Adherence clustering: an efficient method for mining market-basket clusters

Information Systems
SCALE: a scalable framework for efficiently clustering transactional data

Data Mining and Knowledge Discovery
Hierarchical density-based clustering of categorical data and a simplification

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A weighted common structure based clustering technique for XML documents

Journal of Systems and Software
Discovering Knowledge-Sharing Communities in Question-Answering Forums

ACM Transactions on Knowledge Discovery from Data (TKDD)
A practical approach for clustering transaction data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A new sequential mining approach to XML document clustering*

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Similarity search in transaction databases with a two-level bounding mechanism

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Clustering and retrieval of XML documents by structure

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
User group profile modeling based on user transactional data for personalized systems

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
DHCC: Divisive hierarchical clustering of categorical data

Data Mining and Knowledge Discovery
From Context to Distance: Learning Dissimilarity for Categorical Data Clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering categorical data using coverage density

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
σ-SCLOPE: clustering categorical streams using attribute selection

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Clustering of heterogeneously typed data with soft computing - a case study

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
iTree: efficiently discovering high-coverage configurations using interaction trees

Proceedings of the 34th International Conference on Software Engineering
Bi-criteria test suite reduction by cluster analysis of execution profiles

CEE-SET'09 Proceedings of the 4th IFIP TC 2 Central and East European conference on Advances in Software Engineering Techniques
A self-organizing map for transactional data and the related categorical domain

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of categorical data clustering, especially for transactional data characterized by high dimensionality and large volume. Starting from a heuristic method of increasing the height-to-width ratio of the cluster histogram, we develop a novel algorithm -- CLOPE, which is very fast and scalable, while being quite effective. We demonstrate the performance of our algorithm on two real world datasets, and compare CLOPE with the state-of-art algorithms.