Similarity-based approach for positive and unlabelled learning

  • Authors:
  • Yanshan Xiao;Bo Liu;Jie Yin;Longbing Cao;Chengqi Zhang;Zhifeng Hao

  • Affiliations:
  • School of Computer, Guangdong University of Technology, Guangzhou, China and Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia;College of Automation Science and Engineering, South China University of Technology, Guangzhou, China and School of Automation, Guangdong University of Technology, Guangzhou, China and Faculty of ...;Information Engineering Laboratory, CSIRO ICT Centre, Australia;Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia;Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia;School of Computer, Guangdong University of Technology, Guangzhou, China

  • Venue:
  • IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which can not be explicitly identified as positive or negative (we call them ambiguous examples), they either exclude them from the training phase or simply enforce them to either class. Consequently, their performance may be constrained. This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively. The local similarity-based and global similarity-based mechanisms are proposed to generate the similarity weights. The ambiguous examples and their similarity-weights are thereafter incorporated into an SVM-based learning phase to build a more accurate classifier. Extensive experiments on real-world datasets have shown that SPUL outperforms state-of-the-art PU learning methods.