Cross-language query classification using web search for exogenous knowledge

  • Authors:
  • Xuerui Wang;Andrei Broder;Evgeniy Gabrilovich;Vanja Josifovski;Bo Pang

  • Affiliations:
  • University of Massachusetts, Amherst, MA;Yahoo! Research, Santa Clara, CA;Yahoo! Research, Santa Clara, CA;Yahoo! Research, Santa Clara, CA;Yahoo! Research, Santa Clara, CA

  • Venue:
  • Proceedings of the Second ACM International Conference on Web Search and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages.