An outlier-aware data clustering algorithm in mixture models

  • Authors:
  • Nguyen Duc Thang;Chen Lihui;Chan Chee Keong

  • Affiliations:
  • School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

  • Venue:
  • ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

A robust mixture model-based clustering algorithm using genetic techniques is proposed in this paper. In many engineering and application domains, noisy samples and outliers often exist in data collections, causing negative effects on performance of data mining methods if they are not made aware of these elements. Classical probabilistic mixture-based clustering is one known to be very sensitive to such situation. To improve its performance, we combine Genetic Algorithm (GA) with the expectation-maximization (EM) procedure of the classical model. When trimmed likelihood is used as fitness function of GA, high representative samples are selected and potential outliers are pruned off effectively during the learning process. Experiments on both synthetic and real data for different applications show that our approach outperforms the classical mixture model, by producing more accurate and reliable results.