Cluster-based one-class ensemble for classification problems in information retrieval

Authors:
Nedim Lipka;Benno Stein;Maik Anderka
Affiliations:
Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 8
Cited 1

On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining One-Class Classifiers

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Authorship verification as a one-class classification problem

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Ensembles of One Class Support Vector Machines

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Structured One-Class Classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Predicting quality flaws in user-generated content: the case of wikipedia

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Clustering-based ensembles for one-class classification

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of relevant information retrieval classification problems are one-class classification problems at heart. I.e., labeled data is only available for one class, the so-called target class, and common discrimination-based classification approaches, be them binary or multiclass, are not applicable. Achieving a high effectiveness when solving one-class problems is difficult anyway and it becomes even more challenging when the target class data is multimodal, which is often the case. To address these concerns we propose a cluster-based one-class ensemble that consists of four steps: (1) applying a clustering algorithm to the target class data, (2) training an individual one-class classifier for each of the identified clusters, (3) aggregating the decisions of the individual classifiers, and (4) selecting the best fitting clustering model. We evaluate our approach with four datasets: an artificially generated dataset, a dataset compiled from a known multiclass text corpus, and two datasets related to one-class problems that received much attention recently, namely authorship verification and quality flaw prediction. Our approach outperforms a one-class SVM on all four datasets.