Self-adjusting bootstrapping

Authors:
Shoji Fujiwara;Satoshi Sekine
Affiliations:
Nikkei Digital Media, Inc.;Computer Science Department, New York University
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Year:
2011

Citing 10
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Automatic acquisition of domain knowledge for Information Extraction

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A self-learning universal concept spotter

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Acquiring ontological knowledge from query logs

Proceedings of the 16th international conference on World Wide Web
Keepin' it real: semi-supervised learning with realistic tuning

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bootstrapping has been used as a very efficient method to extract a group of items similar to a given set of seeds. However, the bootstrapping method intrinsically has several parameters whose optimal values differ from task to task, and from target to target. In this paper, first, we will demonstrate that this is really the case and serious problem. Then, we propose self-adjusting bootstrapping, where the original seed is segmented into the real seed and validation data. We initially bootstrap starting with the real seed, trying alternative parameter settings, and use the validation data to identify the optimal settings. This is done repeatedly with alternative segmentations in typical cross-validation fashion. Then the final bootstrapping is performed using the best parameter setting and the entire original seed set in order to create the final output. We conducted experiments to collect sets of company names in different categories. Self-adjusting bootstrapping substantially outperformed a baseline using a uniform parameter setting.