Data selection in semi-supervised learning for name tagging

Authors:
Heng Ji;Ralph Grishman
Affiliations:
New York University, New York, NY;New York University, New York, NY
Venue:
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Year:
2006

Citing 10
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Active Hidden Markov Models for Information Extraction

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A self-learning universal concept spotter

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Scaling context space

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ensemble methods for automatic thesaurus extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving name tagging by reference resolution and relation detection

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Building a Graph of Names and Contextual Patterns for Named Entity Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Can one language bootstrap the other: a case study on event extraction

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Cross-lingual predicate cluster acquisition to improve bilingual event extraction by inductive learning

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two semi-supervised learning techniques to improve a state-of-the-art multi-lingual name tagger. For English and Chinese, the overall system obtains 1.7% - 2.1% improvement in F-measure, representing a 13.5% -- 17.4% relative reduction in the spurious, missing, and incorrect tags. We also conclude that simply relying upon large corpora is not in itself sufficient: we must pay attention to unlabeled data selection too. We describe effective measures to automatically select documents and sentences.