Document Compaction for Efficient Query Biased Snippet Generation

Authors:
Yohannes Tsegay;Simon J. Puglisi;Andrew Turpin;Justin Zobel
Affiliations:
School of Computer Science and IT, RMIT University, Melbourne, Australia;School of Computer Science and IT, RMIT University, Melbourne, Australia;School of Computer Science and IT, RMIT University, Melbourne, Australia;Dept. Computer Science and Software Engineering, University of Melbourne, Australia
Venue:
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Year:
2009

Citing 16
Cited 7

Exploring the similarity space

ACM SIGIR Forum
Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient text summarization using lexical chains

Proceedings of the 5th international conference on Intelligent user interfaces
An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Generic summaries for indexing in information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Pruning long documents for distributed information retrieval

Proceedings of the eleventh international conference on Information and knowledge management
A task-oriented study on the influencing effects of query-biased summarisation in web searching

Information Processing and Management: an International Journal
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Improving Web search efficiency via a locality based static pruning method

WWW '05 Proceedings of the 14th international conference on World Wide Web
Inverted files for text search engines

ACM Computing Surveys (CSUR)
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient query expansion with auxiliary data structures

Information Systems
Fast generation of result snippets in web search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic creation of literature abstracts

IBM Journal of Research and Development

Constructing query-biased summaries: a comparison of human and system generated snippets

Proceedings of the third symposium on Information interaction in context
Caching query-biased snippets for efficient retrieval

Proceedings of the 14th International Conference on Extending Database Technology
Cost-Aware Strategies for Query Result Caching in Web Search Engines

ACM Transactions on the Web (TWEB)
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections

Proceedings of the VLDB Endowment
Can click patterns across user's query logs predict answers to definition questions?

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Sentence length bias in TREC novelty track judgements

Proceedings of the Seventeenth Australasian Document Computing Symposium
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached.