Outlier identification in high dimensions

  • Authors:
  • Peter Filzmoser;Ricardo Maronna;Mark Werner

  • Affiliations:
  • Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstraíe 8-10, 1040 Vienna, Austria;Department of Mathematics, Faculty of Exact Sciences, National University of La Plata, and C.I.C.P.B.A., La Plata, Argentina;Department of Mathematics, The American University in Cairo, Egypt

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

A computationally fast procedure for identifying outliers is presented that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high-dimensional data. This approach requires considerably less computational time than existing methods for outlier detection, and is suitable for use on very large data sets. It is also capable of analyzing the data situation commonly found in certain biological applications in which the number of dimensions is several orders of magnitude larger than the number of observations. The performance of this method is illustrated on real and simulated data with dimension ranging in the thousands.