A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Authors:
Amir Ahmad;Lipika Dey
Affiliations:
MEMS Group, Solid State Physics Lab, Timarpur, Delhi 54, India;Mathematics Department, Indian Institute of Technology, Hauz Khas, Delhi 16, India
Venue:
Pattern Recognition Letters
Year:
2007

Citing 14
Cited 11

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Algorithms for clustering data

Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure

Pattern Recognition
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Similarity-based queries

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity-based word sense disambiguation

Computational Linguistics - Special issue on word sense disambiguation
A feature selection technique for classificatory analysis

Pattern Recognition Letters

A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Swarm optimized organizing map (SWOM): A swarm intelligence basedoptimization of self-organizing map

Expert Systems with Applications: An International Journal
Clustering with Domain Value Dissimilarity for Categorical Data

ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Context-Based Distance Learning for Categorical Data Clustering

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Probabilistic self-organizing maps for qualitative data

Neural Networks
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

Pattern Recognition Letters
Aggregate distance based clustering using fibonacci series-FIBCLUS

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
From Context to Distance: Learning Dissimilarity for Categorical Data Clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering and labeling of multi-dimensional mixed structured data

Search Computing
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Case-based reasoning for classification in the mixed data sets employing the compound distance methods

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.10

Visualization

Abstract

Computation of similarity between categorical data objects in unsupervised learning is an important data mining problem. We propose a method to compute distance between two attribute values of same attribute for unsupervised learning. This approach is based on the fact that similarity of two attribute values is dependent on their relationship with other attributes. Computational cost of this method is linear with respect to number of data objects in data set. To see the effectiveness of our proposed distance measure, we use proposed distance measure with K-mode clustering algorithm to cluster various categorical data sets. Significant improvement in clustering accuracy is observed as compared to clustering results obtained using traditional K-mode clustering algorithm.