Distributional similarity vs. PU learning for entity set expansion

Authors:
Xiao-Li Li;Lei Zhang;Bing Liu;See-Kiong Ng
Affiliations:
Institute for Infocomm Research, Connexis, Singapore;University of Illinois at Chicago, Chicago, IL;University of Illinois at Chicago, Chicago, IL;Institute for Infocomm Research, Connexis, Singapore
Venue:
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Year:
2010

Citing 24
Cited 4

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A Simple Bayesian Framework for Content-Based Image Retrieval

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Collective information extraction with relational Markov networks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Scaling distributional similarity to large corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting domain structure for named entity recognition

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
"More like these": growing entity classes from seeds

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative Set Expansion of Named Entities Using the Web

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A study on similarity and relatedness using distributional and WordNet-based approaches

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning to identify unexpected instances in the test set

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Entity set expansion in opinion documents

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
Instance selection and instance weighting for cross-domain sentiment classification via PU learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributional similarity is a classic technique for entity set expansion, where the system is given a set of seed entities of a particular class, and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning (PU learning) can model the set expansion problem better. Based on the test results of 10 corpora, we show that a PU learning technique outperformed distributional similarity significantly.