Distributional similarity vs. PU learning for entity set expansion

  • Authors:
  • Xiao-Li Li;Lei Zhang;Bing Liu;See-Kiong Ng

  • Affiliations:
  • Institute for Infocomm Research, Connexis, Singapore;University of Illinois at Chicago, Chicago, IL;University of Illinois at Chicago, Chicago, IL;Institute for Infocomm Research, Connexis, Singapore

  • Venue:
  • ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributional similarity is a classic technique for entity set expansion, where the system is given a set of seed entities of a particular class, and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning (PU learning) can model the set expansion problem better. Based on the test results of 10 corpora, we show that a PU learning technique outperformed distributional similarity significantly.