A comparison of sentence retrieval techniques

Authors:
Niranjan Balasubramanian;James Allan;W. Bruce Croft
Affiliations:
University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 8
Cited 10

Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Finding similar questions in large question and answer archives

Proceedings of the 14th ACM international conference on Information and knowledge management
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
Improving the estimation of relevance models using large external corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Aspects of sentence retrieval

Aspects of sentence retrieval
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research

The Evaluation of Sentence Similarity Measures

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Finding text reuse on the web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Syntactic Query Models for Restatement Retrieval

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Facilitating query decomposition in query language modeling by association rule mining using multiple sliding windows

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Efficient set-correlation operator inside databases

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Effective term weighting for sentence retrieval

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Promoting divergent terms in the estimation of relevance models

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Noise robust detection of the emergence and spread of topics on the web

Proceedings of the 2nd Temporal Web Analytics Workshop
Language modelling of constraints for text clustering

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Relevance-based language modelling for recommender systems

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying redundant information in sentences is useful for several applications such as summarization, document provenance, detecting text reuse and novelty detection. The task of identifying redundant information in sentences is defined as follows: Given a query sentence the task is to retrieve sentences from a given collection that express all or some subset of the information present in the query sentence. Sentence retrieval techniques rank sentences based on some measure of their similarity to a query. The effectiveness of such techniques depends on the similarity measure used to rank sentences. An effective retrieval model should be able to handle low word overlap between query and candidate sentences and go beyond just word overlap. Simple language modeling techniques like query likelihood retrieval have outperformed TF-IDF and word overlap based methods for ranking sentences. In this paper, we compare the performance of sentence retrieval using different language modeling techniques for the problem of identifying redundant information.