Using English information in non-English web search

Authors:
Wei Gao;John Blitzer;Ming Zhou
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;University of California Berkeley, Berkeley, CA, USA;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 2nd ACM workshop on Improving non english web searching
Year:
2008

Citing 17
Cited 5

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Query type classification for web document retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cross-Language Evaluation Forum: Objectives, Results, Achievements

Information Retrieval
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Learning random walk models for inducing word dependency distributions

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining key phrase translations from web corpora

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank relational objects and its application to web search

Proceedings of the 17th international conference on World Wide Web
CLEF 2005: multilingual retrieval by combining multiple multilingual ranked lists

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Selection and merging strategies for multilingual information retrieval

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Multilingual PRF: english lends a helping hand

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Multilingual pseudo-relevance feedback: performance study of assisting languages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Fractional similarity: cross-lingual feature selection for search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
From bilingual dictionaries to interlingual document representations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning regional transliteration variants

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The leading web search engines have spent a decade building highly specialized ranking functions for English web pages. One of the reasons these ranking functions are effective is that they are designed around features such as PageRank, automatic query and domain taxonomies, and click-through information, etc. Unfortunately, many of these features are absent or altered in other languages. In this work, we show how to exploit these English features for a subset of Chinese queries which we call linguistically non-local (LNL). LNL Chinese queries have a minimally ambiguous English translation which also functions as a good English query. We first show how to identify pairs of Chinese LNL queries and their English counterparts from Chinese and English query logs. Then we show how to effectively exploit these pairs to improve Chinese relevance ranking. Our improved relevance ranker proceeds by (1) translating a query into English, (2) computing a cross-lingual relational graph between the Chinese and English documents, and (3) employing the relational ranking method of Qin et al. [15] to rank the Chinese documents. Our technique gives consistent improvements over a state-of-the-art Chinese mono-lingual ranker on web search data from the Microsoft Live China search engine.