Outlier identification in high dimensions

Authors:
Peter Filzmoser;Ricardo Maronna;Mark Werner
Affiliations:
Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstraíe 8-10, 1040 Vienna, Austria;Department of Mathematics, Faculty of Exact Sciences, National University of La Plata, and C.I.C.P.B.A., La Plata, Argentina;Department of Mathematics, The American University in Cairo, Egypt
Venue:
Computational Statistics & Data Analysis
Year:
2008

Citing 3
Cited 12

A fast algorithm for the minimum covariance determinant estimator

Technometrics
BACON: blocked adaptive computationally efficient outlier nominators

Computational Statistics & Data Analysis
Multivariate outlier detection in exploration geochemistry

Computers & Geosciences

Building Shape Models from Lousy Data

MICCAI '09 Proceedings of the 12th International Conference on Medical Image Computing and Computer-Assisted Intervention: Part II
Detecting influential observations in principal components and common principal components

Computational Statistics & Data Analysis
Robust concentration graph model selection

Computational Statistics & Data Analysis
Error rates for multivariate outlier detection

Computational Statistics & Data Analysis
Outliers detection in environmental monitoring databases

Engineering Applications of Artificial Intelligence
Detection of multivariate outliers in business survey data with incomplete information

Advances in Data Analysis and Classification
A Stahel-Donoho estimator based on huberized outlyingness

Computational Statistics & Data Analysis
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Brushing moments in interactive visual analysis

EuroVis'10 Proceedings of the 12th Eurographics / IEEE - VGTC conference on Visualization
Robust distances for outlier-free goodness-of-fit testing

Computational Statistics & Data Analysis
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Pointwise probability reinforcements for robust statistical inference

Neural Networks

Quantified Score

Hi-index	0.03

Visualization

Abstract

A computationally fast procedure for identifying outliers is presented that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high-dimensional data. This approach requires considerably less computational time than existing methods for outlier detection, and is suitable for use on very large data sets. It is also capable of analyzing the data situation commonly found in certain biological applications in which the number of dimensions is several orders of magnitude larger than the number of observations. The performance of this method is illustrated on real and simulated data with dimension ranging in the thousands.