Concept-based feature generation and selection for information retrieval

  • Authors:
  • Ofer Egozi;Evgeniy Gabrilovich;Shaul Markovitch

  • Affiliations:
  • Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel;Yahoo! Research, Santa Clara, CA and Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel;Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel

  • Venue:
  • AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.