Comparing different units for query translation in Chinese cross-language information retrieval

Authors:
Lixin Shi;Jian-Yun Nie;Jing Bai
Affiliations:
Université de Montréal, Montréal, Québec, Canada;Université de Montréal, Montréal, Québec, Canada;Université de Montréal, Montréal, Québec, Canada
Venue:
Proceedings of the 2nd international conference on Scalable information systems
Year:
2007

Citing 23
Cited 3

Fast and quasi-natural language search for gigabytes of Chinese texts

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimal weight assignment for a Chinese signature file

Information Processing and Management: an International Journal
On Chinese text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
QUILT: implementing a large-scale cross-language text retrieval system

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Translingual information retrieval: learning from bilingual corpora

Artificial Intelligence - Special issue: artificial intelligence 40 years later
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On the use of words and n-grams for Chinese information retrieval

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of Chinese document indexing strategies and retrieval models

ACM Transactions on Asian Language Information Processing (TALIP)
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Automatic construction of parallel English-Chinese corpus for cross-language information retrieval

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Word identification for Mandarin Chinese sentences

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Study of cross lingual information retrieval using on-line translation systems

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Filtering or adapting: two strategies to exploit noisy parallel corpora for cross-language information retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Context-dependent term relations for information retrieval

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

Relating dependent indexes using dempster-shafer theory

Proceedings of the 17th ACM conference on Information and knowledge management
Translation disambiguation for cross-language information retrieval using context-based translation probability

Journal of Information Science
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although both words and n-grams of characters have been used in Chinese IR, they have often been used as two competing methods. For cross-language IR with Chinese, word translation has been used in all previous studies. In this paper, we re-examine the use of n-grams and words for monolingual Chinese IR. We show that both types of indexing unit can be combined within the language modeling framework to produce higher retrieval effectiveness. For CLIR with Chinese, we investigate the possibility of using bigrams and unigrams as translation units. Several translation models from English words to Chinese unigrams, bigrams and words are created based on a parallel corpus. An English query is then translated in several ways, each producing a ranking score. The final ranking score combines all these types of translation. Our experiments on several collections show that Chinese character n-grams are reasonable alternative translation units to words, and they lead to retrieval effectiveness comparable to words. In addition, combinations of both words and n-grams produce higher effectiveness.