Tagging English text with a probabilistic model
Computational Linguistics
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Memory-based morphological analysis
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
A Bayesian model for morpheme and paradigm identification
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Minimally supervised morphological analysis by multimodal alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Language independent, minimally supervised induction of lexical probabilities
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Minimally supervised induction of grammatical gender
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language independent NER using a unified model of internal and contextual evidence
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Statistical machine translation using coercive two-level syntactic transduction
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
BiFrameNet: bilingual frame semantics resource construction by cross-lingual induction
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Tagging Portuguese with a Spanish tagger using cognates
CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
A low-budget tagger for Old Czech
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Verb analysis in a highly inflective language with an MFF algorithm
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Part-of-speech tagging for Chinese-English mixed texts with dynamic features
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Semi-automatic acquisition of two-level morphological rules for iban language
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one person-day of data acquisition effort. It requires only three resources, which are currently readily available in 60-100 world languages: (1) an online or hard-copy pocket-sized bilingual dictionary, (2) a basic library reference grammar, and (3) access to an existing monolingual text corpus in the language. The algorithm begins by inducing initial lexical POS distributions from English translations in a bilingual dictionary without POS tags. It handles irregular, regular and semi-regular morphology through a robust generative model using weighted Levenshtein alignments. Unsupervised induction of grammatical gender is performed via global modeling of context-window feature agreement. Using a combination of these and other evidence sources, interactive training of context and lexical prior models are accomplished for fine-grained POS tag spaces. Experiments show high accuracy, fine-grained tag resolution with minimal new human effort.