BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
Two-phase clustering process for outliers detection
Pattern Recognition Letters
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
C2P: Clustering based on Closest Pairs
Proceedings of the 27th International Conference on Very Large Data Bases
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Discovering cluster-based local outliers
Pattern Recognition Letters
TCSOM: Clustering Transactions Using Self-Organizing Map
Neural Processing Letters
A clustering-based method for unsupervised intrusion detections
Pattern Recognition Letters
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
Hierarchical clustering of mixed data based on distance hierarchy
Information Sciences: an International Journal
MMR: An algorithm for clustering categorical data using Rough Set Theory
Data & Knowledge Engineering
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
G-ANMI: A mutual information based genetic clustering algorithm for categorical data
Knowledge-Based Systems
Fuzzy clustering based ad recommendation for TV programs
EuroITV'07 Proceedings of the 5th European conference on Interactive TV: a shared experience
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Improving k-modes algorithm considering frequencies of attribute values in mode
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A dissimilarity measure for the k-Modes clustering algorithm
Knowledge-Based Systems
Feature selection and clustering in software quality prediction
EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
Clustering categorical data streams
Journal of Computational Methods in Sciences and Engineering
Journal of Intelligent Manufacturing
Hamming Distance based Clustering Algorithm
International Journal of Information Retrieval Research
Hi-index | 0.01 |
This paper presents a new efficient algorithm for clustering categorical data, Squeezer, which can produce high quality clustering results and at the same time deserve good scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning t to an existing cluster (initially none), or creating t as a new cluster, which is determined by the similarities between t and clusters. Due to its characteristics, the proposed algorithm is extremely suitable for clustering data streams, where given a sequence of points, the objective is to maintain consistently good clustering of the sequence so far, using a small amount of memory and time. Outliers can also be handled efficiently and directly in Squeezer. Experimental results on real-life and synthetic datasets verify the superiority of Squeezer.