A comparison of sentence retrieval techniques

  • Authors:
  • Niranjan Balasubramanian;James Allan;W. Bruce Croft

  • Affiliations:
  • University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying redundant information in sentences is useful for several applications such as summarization, document provenance, detecting text reuse and novelty detection. The task of identifying redundant information in sentences is defined as follows: Given a query sentence the task is to retrieve sentences from a given collection that express all or some subset of the information present in the query sentence. Sentence retrieval techniques rank sentences based on some measure of their similarity to a query. The effectiveness of such techniques depends on the similarity measure used to rank sentences. An effective retrieval model should be able to handle low word overlap between query and candidate sentences and go beyond just word overlap. Simple language modeling techniques like query likelihood retrieval have outperformed TF-IDF and word overlap based methods for ranking sentences. In this paper, we compare the performance of sentence retrieval using different language modeling techniques for the problem of identifying redundant information.