An outlier-aware data clustering algorithm in mixture models

Authors:
Nguyen Duc Thang;Chen Lihui;Chan Chee Keong
Affiliations:
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Venue:
ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Year:
2009

Citing 9
Cited 2

Robust regression and outlier detection

Robust regression and outlier detection
Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms

Computational Statistics & Data Analysis
Practical genetic algorithms

Practical genetic algorithms
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians

The Journal of Machine Learning Research
Scale-invariant clustering with minimum volume ellipsoids

Computers and Operations Research
An efficient feature selection approach for clustering: using a Gaussian mixture model of data dissimilarity

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
Image Segmentation Using Hidden Markov Gauss Mixture Models

IEEE Transactions on Image Processing

Trial pruning based on genetic algorithm for single-trial EEG classification

Computers and Electrical Engineering
Robust learning of mixture models and its application on trial pruning for EEG signal analysis

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

A robust mixture model-based clustering algorithm using genetic techniques is proposed in this paper. In many engineering and application domains, noisy samples and outliers often exist in data collections, causing negative effects on performance of data mining methods if they are not made aware of these elements. Classical probabilistic mixture-based clustering is one known to be very sensitive to such situation. To improve its performance, we combine Genetic Algorithm (GA) with the expectation-maximization (EM) procedure of the classical model. When trimmed likelihood is used as fitness function of GA, high representative samples are selected and potential outliers are pruned off effectively during the learning process. Experiments on both synthetic and real data for different applications show that our approach outperforms the classical mixture model, by producing more accurate and reliable results.