Document Compaction for Efficient Query Biased Snippet Generation

  • Authors:
  • Yohannes Tsegay;Simon J. Puglisi;Andrew Turpin;Justin Zobel

  • Affiliations:
  • School of Computer Science and IT, RMIT University, Melbourne, Australia;School of Computer Science and IT, RMIT University, Melbourne, Australia;School of Computer Science and IT, RMIT University, Melbourne, Australia;Dept. Computer Science and Software Engineering, University of Melbourne, Australia

  • Venue:
  • ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached.