Outlier detection using the smallest kernel principal components

  • Authors:
  • Alan J. Izenman;Yan Shen

  • Affiliations:
  • Temple University;Temple University

  • Venue:
  • Outlier detection using the smallest kernel principal components
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The smallest principal components have not attracted much attention in the statistics literature. This apparent lack of interest is due to the fact that, compared with the largest principal components that contain most of the total variance in the data, the smallest principal components only contain the noise of the data and, therefore, contribute minimal information. On the other hand, because outliers are a common source of noise, the smallest principal components should be useful for outlier detection. In this dissertation, we first review the kernel methods and kernel principal component analysis. Then, we propose a new method for outlier detection using the smallest kernel principal components in a feature space, instead of using the smallest principal components in the original space. To apply this method, we first map the data from the original space to a feature space through a properly defined kernel function and, then, perform the standard principal component analysis in that feature space. We define the smallest kernel principal components and show that the eigenvalues corresponding to the smallest kernel principal components can be viewed as residual sum of squares. Therefore, we could use the smallest kernel principal components to detect outliers with simple graphical techniques. A cutoff between "large" and "small" kernel principal components is proposed, and a nonparametric method to determine the smallest kernel principal components is suggested. Simulation studies show that, under univariate outlier situation, the proposed method in detecting outliers is as good as the best method available. Real-data examples suggest that this method is at least as useful as other methods, and sometimes is better.