A statistical approach to machine translation
Computational Linguistics
An estimate of an upper bound for the entropy of English
Computational Linguistics
Learning morphological disambiguation rules for Turkish
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Morph-based speech recognition and modeling of out-of-vocabulary words across languages
ACM Transactions on Speech and Language Processing (TSLP)
KU: word sense disambiguation by substitution
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
USYD: WSD and lexical substitution using the Web1T corpus
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Automatic Sanskrit segmentizer using finite state transducers
ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Improved modeling of out-of-vocabulary words using morphological classes
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n -- 1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n -- 1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.