A comparison of indexing techniques for Japanese text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
The String-to-String Correction Problem
Journal of the ACM (JACM)
The effects of word order and segmentation on translation retrieval performance
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Toward memory-based translation
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
CTM: an example-based translation aid system
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Retrieving meaning-equivalent sentences for example-based rough translation
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
A Reexamination of MRD-Based Word Sense Disambiguation
ACM Transactions on Asian Language Information Processing (TALIP)
The Japanese translation task: lexical and structural perspectives
SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Hi-index | 0.00 |
In this paper, we compare the relative effects of segment order, segmentation and segment contiguity on the retrieval performance of a translation memory system. We take a selection of both bag-of-words and segment order-sensitive string comparison methods, and run each over both character and word-segmented data, in combination with a range of local segment contiguity models (in the form of N-grams). Over two distinct datasets, we find that indexing according to simple character bigrams produces a retrieval accuracy superior to any of the tested word N-gram models. Further, in their optimum configuration, bag-of-words methods are shown to be equivalent to segment order-sensitive methods in terms of retrieval accuracy, but much faster. We also provide evidence that our findings are scalable.