Similarity-based approach for positive and unlabelled learning

Authors:
Yanshan Xiao;Bo Liu;Jie Yin;Longbing Cao;Chengqi Zhang;Zhifeng Hao
Affiliations:
School of Computer, Guangdong University of Technology, Guangzhou, China and Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia;College of Automation Science and Engineering, South China University of Technology, Guangzhou, China and School of Automation, Guangdong University of Technology, Guangzhou, China and Faculty of ...;Information Engineering Laboratory, CSIRO ICT Centre, Australia;Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia;Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia;School of Computer, Guangdong University of Technology, Guangzhou, China
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Year:
2011

Citing 10
Cited 1

The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Learning to Classify Documents with Only a Small Positive Training Set

ECML '07 Proceedings of the 18th European conference on Machine Learning
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning with Positive and Unlabeled Examples Using Topic-Sensitive PLSA

IEEE Transactions on Knowledge and Data Engineering

Supervised hypothesis discovery using syllogistic patterns in the biomedical literature

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which can not be explicitly identified as positive or negative (we call them ambiguous examples), they either exclude them from the training phase or simply enforce them to either class. Consequently, their performance may be constrained. This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively. The local similarity-based and global similarity-based mechanisms are proposed to generate the similarity weights. The ambiguous examples and their similarity-weights are thereafter incorporated into an SVM-based learning phase to build a more accurate classifier. Extensive experiments on real-world datasets have shown that SPUL outperforms state-of-the-art PU learning methods.