Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Classification Framework for Anomaly Detection
The Journal of Machine Learning Research
Error limiting reductions between classification tasks
ICML '05 Proceedings of the 22nd international conference on Machine learning
Estimating the Support of a High-Dimensional Distribution
Neural Computation
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Learning from positive and unlabeled examples
Theoretical Computer Science - Algorithmic learning theory (ALT 2000)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Consistency and Convergence Rates of One-Class SVMs and Related Algorithms
The Journal of Machine Learning Research
Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption
The Journal of Machine Learning Research
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Positive and Unlabeled Examples: A Survey
ISIP '08 Proceedings of the 2008 International Symposiums on Information Processing
A Neyman-Pearson approach to statistical learning
IEEE Transactions on Information Theory
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
Neyman-Pearson Classification, Convexity and Stochastic Constraints
The Journal of Machine Learning Research
Proceedings of the 21st ACM international conference on Information and knowledge management
Toward supervised anomaly detection
Journal of Artificial Intelligence Research
A plug-in approach to neyman-pearson classification
The Journal of Machine Learning Research
Hi-index | 0.00 |
A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semi-supervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, semi-supervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distribution-free, learning-theoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general two-sample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard p-value approach to multiple testing under the so-called random effects model. Unlike standard rejection regions based on thresholded p-values, the general SSND framework allows for adaptation to arbitrary alternative distributions in multiple dimensions.