Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Active Hidden Markov Models for Information Extraction
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A self-learning universal concept spotter
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ensemble methods for automatic thesaurus extraction
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving name tagging by reference resolution and relation detection
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Building a Graph of Names and Contextual Patterns for Named Entity Classification
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Can one language bootstrap the other: a case study on event extraction
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Hi-index | 0.00 |
We present two semi-supervised learning techniques to improve a state-of-the-art multi-lingual name tagger. For English and Chinese, the overall system obtains 1.7% - 2.1% improvement in F-measure, representing a 13.5% -- 17.4% relative reduction in the spurious, missing, and incorrect tags. We also conclude that simply relying upon large corpora is not in itself sufficient: we must pay attention to unlabeled data selection too. We describe effective measures to automatically select documents and sentences.