Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
Multilingual text retrieval extends the basic monolingual detection task to include retrieving relevant documents in languages other than the query language. The task therefore merges efforts in machine translation with efforts in text retrieval, but the machine translation component may be substantially simplified due to some basic assumptions about the design and implementation of high-performance text retrieval systems. A primary consideration is that most modern text retrieval systems regard queries and documents as unordered "bags" of words. The translation of an unordered set of terms is therefore approximately the translation of the terms themselves. Although a linearity assumption such as this breaks down when considering phrasal elements in most languages, it is reasonably accurate for many terms and becomes increasingly accurate at the sentence level and above.