Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Authors:
Randa Kassab;Frédéric Alexandre
Affiliations:
LORIA, INRIA Lorraine, Vandœuvre-lès-Nancy Cedex, France 54506;LORIA, INRIA Lorraine, Vandœuvre-lès-Nancy Cedex, France 54506
Venue:
Machine Learning
Year:
2009

Citing 35
Cited 1

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Evaluating text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Backpropagation and unsupervised learning in linear networks

Backpropagation
The nature of statistical learning theory

The nature of statistical learning theory
Algebraic methods for computing a generalized inverse

ACM SIGSAM Bulletin
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Learning to recommend from positive evidence

Proceedings of the 5th international conference on Intelligent user interfaces
Supervised versus unsupervised binary-learning by feedforward neural networks

Machine Learning
A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
One-class svms for document classification

The Journal of Machine Learning Research
Uniform object generation for optimizing one-class classifiers

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Text classification from positive and unlabeled documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Novelty detection: a review—part 1: statistical approaches

Signal Processing
Novelty detection: a review—part 2: neural network based approaches

Signal Processing
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles

Journal of the American Society for Information Science and Technology
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Nonlinear Autoassociation Is Not Equivalent to PCA

Neural Computation
A new approach to intelligent text filtering based on novelty detection

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
A new approach to intelligent text filtering based on novelty detection

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
An Interacting Systems Model of Infant Habituation

Journal of Cognitive Neuroscience
Application of LVQ to novelty detection using outlier training data

Pattern Recognition Letters
Towards a synthetic analysis of user's information need for more effective personalized filtering services

Proceedings of the 2007 ACM symposium on Applied computing
A novelty detection approach to classification

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Review: A review of novelty detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most conventional learning algorithms require both positive and negative training data for achieving accurate classification results. However, the problem of learning classifiers from only positive data arises in many applications where negative data are too costly, difficult to obtain, or not available at all. This paper describes a new machine learning approach, called ILoNDF (Incremental data-driven Learning of Novelty Detector Filter). The approach is inspired by novelty detection theory and its learning method, which typically requires only examples from one class to learn a model. One advantage of ILoNDF is the ability of its generative learning to capture the intrinsic characteristics of the training data by continuously integrating the information relating to the relative frequencies of the features of training data and their co-occurrence dependencies. This makes ILoNDF rather stable and less sensitive to noisy features which may be present in the representation of the positive data. In addition, ILoNDF does not require extensive computational resources since it operates on-line without repeated training, and no parameters need to be tuned. In this study we mainly focus on the robustness of ILoNDF in dealing with high-dimensional noisy data and we investigate the variation of its performance depending on the amount of data available for training. To make our study comparable to previous studies, we investigate four common methods: PCA residuals, Hotelling's T 2 test, an auto-associative neural network, and a one-class version of the SVM classifier (lately a favored method for one-class classification). Experiments are conducted on two real-world text corpora: Reuters and WebKB. Results show that ILoNDF tends to be more robust, is less affected by initial settings, and consistently outperforms the other methods.