Fast and quasi-natural language search for gigabytes of Chinese texts
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimal weight assignment for a Chinese signature file
Information Processing and Management: an International Journal
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
QUILT: implementing a large-scale cross-language text retrieval system
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Translingual information retrieval: learning from bilingual corpora
Artificial Intelligence - Special issue: artificial intelligence 40 years later
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On the use of words and n-grams for Chinese information retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Improving query translation for cross-language information retrieval using statistical models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of Chinese document indexing strategies and retrieval models
ACM Transactions on Asian Language Information Processing (TALIP)
Embedding web-based statistical translation models in cross-language information retrieval
Computational Linguistics - Special issue on web as corpus
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Automatic construction of parallel English-Chinese corpus for cross-language information retrieval
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Study of cross lingual information retrieval using on-line translation systems
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Context-dependent term relations for information retrieval
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Relating dependent indexes using dempster-shafer theory
Proceedings of the 17th ACM conference on Information and knowledge management
Journal of Information Science
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Although both words and n-grams of characters have been used in Chinese IR, they have often been used as two competing methods. For cross-language IR with Chinese, word translation has been used in all previous studies. In this paper, we re-examine the use of n-grams and words for monolingual Chinese IR. We show that both types of indexing unit can be combined within the language modeling framework to produce higher retrieval effectiveness. For CLIR with Chinese, we investigate the possibility of using bigrams and unigrams as translation units. Several translation models from English words to Chinese unigrams, bigrams and words are created based on a parallel corpus. An English query is then translated in several ways, each producing a ranking score. The final ranking score combines all these types of translation. Our experiments on several collections show that Chinese character n-grams are reasonable alternative translation units to words, and they lead to retrieval effectiveness comparable to words. In addition, combinations of both words and n-grams produce higher effectiveness.