Domain adaptive bootstrapping for named entity recognition

Authors:
Dan Wu;Wee Sun Lee;Nan Ye;Hai Leong Chieu
Affiliations:
Singapore MIT Alliance;National University of Singapore;National University of Singapore;DSO National Laboratories
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Year:
2009

Citing 10
Cited 10

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Optimization, maxent models, and conditional estimation without magic

NAACL-Tutorials '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials - Volume 5
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative learning for differing training and test distributions

Proceedings of the 24th international conference on Machine learning
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research

Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Domain adaptation of rule-based annotators for named-entity recognition tasks

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A robust semi-supervised classification method for transfer learning

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Bootstrapped named entity recognition for product attribute extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Cross-domain co-extraction of sentiment and topic lexicons

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Learning multilingual named entity recognition from Wikipedia

Artificial Intelligence
Named entity recognition for tweets

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Transfer joint embedding for cross-domain named entity recognition

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bootstrapping is the process of improving the performance of a trained classifier by iteratively adding data that is labeled by the classifier itself to the training set, and retraining the classifier. It is often used in situations where labeled training data is scarce but unlabeled data is abundant. In this paper, we consider the problem of domain adaptation: the situation where training data may not be scarce, but belongs to a different domain from the target application domain. As the distribution of unlabeled data is different from the training data, standard bootstrapping often has difficulty selecting informative data to add to the training set. We propose an effective domain adaptive bootstrapping algorithm that selects unlabeled target domain data that are informative about the target domain and easy to automatically label correctly. We call these instances bridges, as they are used to bridge the source domain to the target domain. We show that the method outperforms supervised, transductive and bootstrapping algorithms on the named entity recognition task.