HITS-based seed selection and stop list construction for bootstrapping

Authors:
Tetsuo Kiso;Masashi Shimbo;Mamoru Komachi;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 23
Cited 2

Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Word-sense disambiguation using decomposable models

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Unsupervised learning of generalized names

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Example selection for bootstrapping statistical parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Corpus-based statistical sense resolution

HLT '93 Proceedings of the workshop on Human Language Technology
Understanding the Yarowsky Algorithm

Computational Linguistics
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Introduction to Information Retrieval

Introduction to Information Retrieval
Graph-based analysis of semantic drift in Espresso-like bootstrapping algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Helping editors choose better seed sets for entity set expansion

Proceedings of the 18th ACM conference on Information and knowledge management
Reducing semantic drift with bagging and distributional similarity

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Person name disambiguation by bootstrapping

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On learning subtypes of the part-whole relation: do not mix your seeds

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised discovery of negative categories in lexicon bootstrapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Minimally-supervised extraction of domain-specific part-whole relations using Wikipedia as knowledge-base

Data & Knowledge Engineering
Editorial: Minimally-supervised learning of domain-specific causal relations using an open-domain corpus as knowledge base

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graph-based approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti's Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg's HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.