A comparison of indexing techniques for Japanese text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
The String-to-String Correction Problem
Journal of the ACM (JACM)
Example retrieval from a translation memory
Natural Language Engineering
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
CTM: an example-based translation aid system
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Improvement of a Whole Sentence Maximum Entropy Language Model using grammatical features
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Low-cost, high-performance translation retrieval: dumber is better
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
EXTRA: a system for example-based translation assistance
Machine Translation
An efficient pattern matching algorithm for comparative Genome sequence analysis
ACC'08 Proceedings of the WSEAS International Conference on Applied Computing Conference
Comparative genome sequence analysis by efficient pattern matching technique
WSEAS Transactions on Information Science and Applications
Computational Linguistics
Hi-index | 0.00 |
This research looks at the effects of word order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and word order-sensitive similarity metrics, and test each over character-based and word-based indexing. The translation retrieval performance of each system configuration is evaluated empirically through the notion of word edit distance between translation candidate outputs and the model translation. Our results indicate that character-based indexing is consistently superior to word-based indexing, suggesting that segmentation is an unnecessary luxury in the given domain. Word order-sensitive approaches are demonstrated to generally outperform bag-of-words methods, with source language segment-level edit distance proving the most effective similarity metric.