Estimating the Support of a High-Dimensional Distribution

  • Authors:
  • Bernhard Schölkopf;John C. Platt;John C. Shawe-Taylor;Alex J. Smola;Robert C. Williamson

  • Affiliations:
  • Microsoft Research Ltd, Cambridge CB2 3NH, U.K.;Microsoft Research, Redmond, WA 98052, U.S.A;Royal Holloway, University of London, Egham, Surrey TW20 OEX, U.K.;Department of Engineering, Australian National University, Canberra 0200, Australia;Department of Engineering, Australian National University, Canberra 0200, Australia

  • Venue:
  • Neural Computation
  • Year:
  • 2001

Quantified Score

Hi-index 0.03

Visualization

Abstract

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.