A statistical approach to machine translation
Computational Linguistics
Improving cross language retrieval with triangulated translation
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A statistical parser for Czech
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Serial combination of rules and statistics: a case study in Czech tagging
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Multipath translation lexicon induction via bridge languages
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Bootstrapping a multilingual part-of-speech tagger in one person-day
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Bootstrapping POS taggers using unlabelled data
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Hi-index | 0.00 |
We implement a variant of the algorithm described by Yarowsky and Ngai in [21] to induce an HMM POS tagger for an arbitrary target language using only an existing POS tagger for a source language and an unannotated parallel corpus between the source and target languages. We extend this work by projecting from multiple source languages onto a single target language. We hypothesize that systematic transfer errors from differing source languages will cancel out, improving the quality of bootstrapped resources in the target language. Our experiments confirm the hypothesis. Each experiment compares three cases: (a) source data comes from a single language A, (b) source data comes from a single language B, and (c) source data comes from both A and B, but half as much from each. Apart from the source language, other conditions are held constant in all three cases – including the total amount of source data used. The null hypothesis is that performance in the mixed case would be an average of performance in the single-language cases, but in fact, mixed-case performance always exceeds the maximum of the single-language cases. We observed this effect in all six experiments we ran, involving three different source-language pairs and two different target languages.