Induction of fine-grained part-of-speech taggers via classifier combination and crosslingual projection

Authors:
Elliott Franco Drábek;David Yarowsky
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Year:
2005

Citing 11
Cited 0

Part-of-Speech Tagging Using Decision Trees

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Supertagging: an approach to almost parsing

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Principle-based parsing without overgeneration

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Minimally supervised induction of grammatical gender

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Using 'smart' bilingual projection to feature-tag a monolingual dictionary

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an original approach to part-of-speech tagging of fine-grained features (such as case, aspect, and adjective person/number) in languages such as English where these properties are generally not morphologically marked. The goals of such rich lexical tagging in English are to provide additional features for word alignment models in bilingual corpora (for statistical machine translation), and to provide an information source for part-of-speech tagger induction in new languages via tag projection across bilingual corpora. First, we present a classifier-combination approach to tagging English bitext with very fine-grained part-of-speech tags necessary for annotating morphologically richer languages such as Czech and French, combining the extracted features of three major English parsers, and achieve fine-grained-tag-level syntactic analysis accuracy higher than any individual parser. Second, we present experimental results for the cross-language projection of part-of-speech taggers in Czech and French via word-aligned bitext, achieving successful fine-grained part-of-speech tagging of these languages without any Czech or French training data of any kind.