Semi-automatic entity set refinement

Authors:
Vishnu Vyas;Patrick Pantel
Affiliations:
Yahoo! Labs, Santa Clara, CA;Yahoo! Labs, Santa Clara, CA
Venue:
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2009

Citing 22
Cited 9

Relevance feedback revisited

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Comparing relevance feedback algorithms for web search

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Collective information extraction with relational Markov networks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
"More like these": growing entity classes from seeds

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Acquiring sense tagged examples using relevance feedback

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Helping editors choose better seed sets for entity set expansion

Proceedings of the 18th ACM conference on Information and knowledge management
A web service for automatic word class acquisition

Proceedings of the 3rd International Universal Communication Symposium
An active learning approach to finding related terms

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Constraints based taxonomic relation classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Ranking class labels using query sessions

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Ensemble-based semantic lexicon induction for semantic tagging

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

State of the art set expansion algorithms produce varying quality expansions for different entity types. Even for the highest quality expansions, errors still occur and manual refinements are necessary for most practical uses. In this paper, we propose algorithms to aide this refinement process, greatly reducing the amount of manual labor required. The methods rely on the fact that most expansion errors are systematic, often stemming from the fact that some seed elements are ambiguous. Using our methods, empirical evidence shows that average R-precision over random entity sets improves by 26% to 51% when given from 5 to 10 manually tagged errors. Both proposed refinement models have linear time complexity in set size allowing for practical online use in set expansion systems.