An association-based dissimilarity measure for categorical data

Authors:
Si Quang Le;Tu Bao Ho
Affiliations:
School of Knowledge Science, Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa 923-1292, Japan;School of Knowledge Science, Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa 923-1292, Japan
Venue:
Pattern Recognition Letters
Year:
2005

Citing 5
Cited 4

Unsupervised learning through symbolic clustering

Pattern Recognition Letters
Symbolic clustering using a new dissimilarity measure

Pattern Recognition
Renyi's divergence and entropy rates for finite alphabet Markov sources

IEEE Transactions on Information Theory
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Detecting anomalous records in categorical datasets

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Relationship between restricted dissimilarity functions, restricted equivalence functions and normal EN-functions: Image thresholding invariant

Pattern Recognition Letters
Aggregate distance based clustering using fibonacci series-FIBCLUS

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Association-Based dissimilarity measures for categorical data: limitation and improvement

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.