QROCK: A quick version of the ROCK algorithm for clustering of categorical data

Authors:
M. Dutta;A. Kakoti Mahanta;Arun K. Pujari
Affiliations:
Department of Information Technology, Tezpur University, Tezpur 784 028, India;Department of Computer Science, Guwahati University, Gopi Nath Bordoloi Nagar, Jalukbari, Guwahati, Assam 781 014, India;Department of CIS, University of Hyderabad, Hyderabad 500 046, India
Venue:
Pattern Recognition Letters
Year:
2005

Citing 8
Cited 4

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Mining Meaningful Student Groups Based on Communication History Records

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
The scalable reasoning system: lightweight visualization for distributed analytics

Information Visualization
Real-time visualization of network behaviors for situational awareness

Proceedings of the Seventh International Symposium on Visualization for Cyber Security
Similar or not similar: this is a parameter question

HCI International'13 Proceedings of the 15th international conference on Human Interface and the Management of Information: information and interaction design - Volume Part I

Quantified Score

Hi-index	0.10

Visualization

Abstract

The ROCK algorithm is an agglomerative hierarchical clustering algorithm for clustering categorical data [Guha S., Rastogi, R., Shim, K., 1999. ROCK: A robust clustering algorithm for categorical attributes. In: Proc. IEEE Internat. Conf. Data Engineering, Sydney, March 1999]. In this paper we prove that under certain conditions, the final clusters obtained by the algorithm are nothing but the connected components of a certain graph with the input data-points as vertices. We propose a new algorithm QROCK which computes the clusters by determining the connected components of the graph. This leads to a very efficient method of obtaining the clusters giving a drastic reduction of the computing time of the ROCK algorithm. We also justify that it is more practical for specifying the similarity threshold rather than specifying the desired number of clusters a priori. The QROCK algorithm also detects the outliers in this process. We also discuss a new similarity measure for categorical attributes.