Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Automatic identification of non-compositional phrases
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Word alignment for languages with scarce resources
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Nukti: English-Inuktitut word alignment system description
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Models for Inuktitut-English word alignment
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
An extensible crosslinguistic readability framework
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Automatic identification of parallel documents with light or without linguistic resources
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Evaluating a morphological analyser of Inuktitut
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
A parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of 91.4% and a recall of 92.3%. Our word correspondence technique is aimed at providing the broadest coverage collection of reliable pairs of Inuktitut and English morphemes for dictionary expansion. For an agglutinative language like Inuktitut, this entails considering substrings, not simply whole words. We employ a Pointwise Mutual Information method (PMI) and attain a coverage of 72.3% of English words and a precision of 87%.