A pairwise ranking based approach to learning with positive and unlabeled examples

Authors:
Sundararajan Sellamanickam;Priyanka Garg;Sathiya Keerthi Selvaraj
Affiliations:
Yahoo! Labs, Bangalore, India;The Chinese University of Hong Kong, New Territories, Hong Kong, Hong Kong;Yahoo! Labs, Santa Clara, CA, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 16
Cited 3

Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
One-class svms for document classification

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Text classification from positive and unlabeled documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Bayesian classifiers from positive and unlabeled examples

Pattern Recognition Letters
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature subset selection from positive and unlabelled examples

Pattern Recognition Letters
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Efficient algorithms for ranking with SVMs

Information Retrieval

Query-biased learning to rank for real-time twitter search

Proceedings of the 21st ACM international conference on Information and knowledge management
A survey of learning to rank for real-time twitter search

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Clustering-based transduction for learning a ranking model with limited human labels

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large fraction of binary classification problems arising in web applications are of the type where the positive class is well defined and compact while the negative class comprises everything else in the distribution for which the classifier is developed; it is hard to represent and sample from such a broad negative class. Classifiers based only on positive and unlabeled examples reduce human annotation effort significantly by removing the burden of choosing a representative set of negative examples. Various methods have been proposed in the literature for building such classifiers. Of these, the state of the art methods are Biased SVM and Elkan & Noto's methods. While these methods often work well in practice, they are computationally expensive since hyperparameter tuning is very important, particularly when the size of labeled positive examples set is small and class imbalance is high. In this paper we propose a pairwise ranking based approach to learn from positive and unlabeled examples (LPU) and we give a theoretical justification for it. We present a pairwise RankSVM (RSVM) based method for our approach. The method is simple, efficient, and its hyperparameters are easy to tune. A detailed experimental study using several benchmark datasets shows that the proposed method gives competitive classification performance compared to the mentioned state of the art methods, while training 3-10 times faster. We also propose an efficient AUC based feature selection technique in the LPU setting and demonstrate its usefulness on the datasets. To get an idea of the goodness of the LPU methods we compare them against supervised learning (SL) methods that also make use of negative examples in training. SL methods give a slightly better performance than LPU methods when there is a rich set of negative examples; however, they are inferior when the number of negative training examples is not large enough.