Association-Based dissimilarity measures for categorical data: limitation and improvement

  • Authors:
  • Si Quang Le;Tu Bao Ho;Le Sy Vinh

  • Affiliations:
  • Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, Japan;Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, Japan;John von Neumann Institute for Computing, Juelich, Germany

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Measuring the similarity for categorical data is a challenging task in data mining due to the poor structure of categorical data. This paper presents a dissimilarity measure for categorical data based on the relations among attributes. This measure not only has the advantage of value variance but also overcomes the limitations of condition the probability-based measure when applied to databases whose attributes are independent. Experiments with 30 databases also showed that the proposed measure boosted the accuracy of Nearest Neighbor classification in comparison with other tested measures.