Outlier detection via localized p-value estimation

  • Authors:
  • Manqi Zhao;Venkatesh Saligrama

  • Affiliations:
  • Department of Electrical and Computer Engineering, Boston University, Boston, MA;Department of Electrical and Computer Engineering, Boston University, Boston, MA

  • Venue:
  • Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose a novel non-parametric adaptive outlier detection algorithm, called LPE, for high dimensional data based on score functions derived from nearest neighbor graphs on n-point nominal data. Outliers are predicted whenever the score of a test sample falls below α, which is supposed to be the desired false alarm level. The resulting outlier detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, α, for the case when the density associated with the outliers is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. The whole empirical Receiving Operating Characteristics (ROC) curve can be derived with almost no additional cost based on the estimated score function. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality by incorporating the technique of manifold learning. We demonstrate the algorithm on both artificial and real data sets in high dimensional feature spaces.