Semi-Supervised Novelty Detection

Authors:
Gilles Blanchard;Gyemin Lee;Clayton Scott
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2010

Citing 16
Cited 4

Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Classification Framework for Anomaly Detection

The Journal of Machine Learning Research
Error limiting reductions between classification tasks

ICML '05 Proceedings of the 22nd international conference on Machine learning
Estimating the Support of a High-Dimensional Distribution

Neural Computation
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Learning from positive and unlabeled examples

Theoretical Computer Science - Algorithmic learning theory (ALT 2000)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Learning Minimum Volume Sets

The Journal of Machine Learning Research
Consistency and Convergence Rates of One-Class SVMs and Related Algorithms

The Journal of Machine Learning Research
Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

The Journal of Machine Learning Research
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Positive and Unlabeled Examples: A Survey

ISIP '08 Proceedings of the 2008 International Symposiums on Information Processing
A Neyman-Pearson approach to statistical learning

IEEE Transactions on Information Theory
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Neyman-Pearson Classification, Convexity and Stochastic Constraints

The Journal of Machine Learning Research
Local anomaly descriptor: a robust unsupervised algorithm for anomaly detection based on diffusion space

Proceedings of the 21st ACM international conference on Information and knowledge management
Toward supervised anomaly detection

Journal of Artificial Intelligence Research
A plug-in approach to neyman-pearson classification

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semi-supervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, semi-supervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distribution-free, learning-theoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general two-sample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard p-value approach to multiple testing under the so-called random effects model. Unlike standard rejection regions based on thresholded p-values, the general SSND framework allows for adaptation to arbitrary alternative distributions in multiple dimensions.