Conceptual Clustering Categorical Data with Uncertainty

  • Authors:
  • Yuni Xia;Bowei Xi

  • Affiliations:
  • -;-

  • Venue:
  • ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many real datasets have uncertain categorical attribute values that are only approximately measured or imputed. Uncertainty in categorical data is commonplace in many applications, including biological annotation, medial diagnosis and automatic error detection. In such domains, the exact value of an attribute is often unknown, but may be estimated from a number of reasonable alternatives. Current conceptual clustering algorithms do not provide a convenient means for handling this type of uncertainty. In this paper we extend traditional conceptual clustering algorithm to explicitly handle uncertainty in data values. In this paper we propose new total utility (TU) index for measuring the quality of the clustering. And we develop improved algorithms for efficiently clustering uncertain categorical data, based on the COBWEB conceptual clustering algorithm. Experimental results using real datasets demonstrate how these algorithms and new TU measure can effectively improve the performance of clustering through the use of internal probabilistic information.