Hamming Distance based Clustering Algorithm

Authors:
Ritu Vijay;Prerna Mahajan;Rekha Kandwal
Affiliations:
Bansthali University, India;Prerna Mahajan, Research Scholar, Banasthali University, India;Ministry of Earth Sciences & Science and Technology, India
Venue:
International Journal of Information Retrieval Research
Year:
2012

Citing 14
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Squeezer: an efficient algorithm for clustering categorical data

Journal of Computer Science and Technology
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
C2P: Clustering based on Closest Pairs

Proceedings of the 27th International Conference on Very Large Data Bases
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data.As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.