Robust probabilistic PCA with missing data and contribution analysis for outlier detection

Authors:
Tao Chen;Elaine Martin;Gary Montague
Affiliations:
School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore 637459, Singapore;School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK;School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
Venue:
Computational Statistics & Data Analysis
Year:
2009

Citing 5
Cited 6

ML estimation of the multivariate t distribution and the EM algorithm

Journal of Multivariate Analysis
Mixtures of probabilistic principal component analyzers

Neural Computation
A fast algorithm for the minimum covariance determinant estimator

Technometrics
Robust mixture modelling using the t distribution

Statistics and Computing
Robust probabilistic projections

ICML '06 Proceedings of the 23rd international conference on Machine learning

A new probabilistic approach to on-line learning in artificial neural networks

ASMCSS'09 Proceedings of the 3rd International Conference on Applied Mathematics, Simulation, Modelling, Circuits, Systems and Signals
Detecting influential observations in principal components and common principal components

Computational Statistics & Data Analysis
Detecting influential observations in Kernel PCA

Computational Statistics & Data Analysis
The infinite Student's t-mixture for robust modeling

Signal Processing
Robust data clustering by learning multi-metric Lq-norm distances

Expert Systems with Applications: An International Journal
Stock fraud detection using peer group analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.03

Visualization

Abstract

Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t-distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process.