Clustering with Domain Value Dissimilarity for Categorical Data

Authors:
Jeonghoon Lee;Yoon-Joon Lee;Minho Park
Affiliations:
School of EECS, Division of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 350-701;School of EECS, Division of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 350-701;Information Technology Department, The Bank of Korea, Seoul, Republic of Korea 135-080
Venue:
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Year:
2009

Citing 8
Cited 1

Algorithms for clustering data

Algorithms for clustering data
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining

Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining
A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Pattern Recognition Letters

Clustering of heterogeneously typed data with soft computing - a case study

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a representative grouping process to find out hidden information and understand the characteristics of dataset to get a view of the further analysis. The concept of similarity and dissimilarity of objects is a fundamental decisive factor for clustering and the measure of them dominates the quality of results. When attributes of data are categorical, it is not simple to quantify the dissimilarity of data objects that have unimportant attributes or synonymous values. We suggest a new idea to quantify dissimilarity of objects by using distribution information of data correlated to each categorical value. Our method discovers intrinsic relationship of values and measures dissimilarity of objects effectively. Our approach does not couple with a clustering algorithm tightly and so can be applied various algorithms flexibly. Experiments on both synthetic and real datasets show propriety and effectiveness of this method. When our method is applied only to traditional clustering algorithms, the results are considerably improved than those of previous methods.