Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models

  • Authors:
  • Jung-Tae Lee;Sang-Bum Kim;Young-In Song;Hae-Chang Rim

  • Affiliations:
  • Korea University, Seoul, Korea;Search Business Team, SK Telecom, Seoul, Korea;Korea University, Seoul, Korea;Korea University, Seoul, Korea

  • Venue:
  • EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Lexical gaps between queries and questions (documents) have been a major issue in question retrieval on large online question and answer (Q&A) collections. Previous studies address the issue by implicitly expanding queries with the help of translation models pre-constructed using statistical techniques. However, since it is possible for unimportant words (e.g., non-topical words, common words) to be included in the translation models, a lack of noise control on the models can cause degradation of retrieval performance. This paper investigates a number of empirical methods for eliminating unimportant words in order to construct compact translation models for retrieval purposes. Experiments conducted on a real world Q&A collection show that substantial improvements in retrieval performance can be achieved by using compact translation models.