Extraction of coherent relevant passages using hidden Markov models

Authors:
Jing Jiang;Chengxiang Zhai
Affiliations:
University of Illinois, Urbana, IL;University of Illinois, Urbana, IL
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2006

Citing 15
Cited 5

Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Document and passage retrieval based on hidden Markov models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Effective ranking with arbitrary passages

Journal of the American Society for Information Science and Technology
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Passage retrieval based on language models

Proceedings of the eleventh international conference on Information and knowledge management
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Passage retrieval vs. document retrieval for factoid question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Answer models for question answering passage retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Combining optimal clustering and Hidden Markov models for extractive summarization

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12

Two phase indexes based passage retrieval in biomedical texts

LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Extraction of contextual information from medical case research report using WordNet

COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
Text classification: a sequential reading approach

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Event recognition based on co-occurrence concept analysis

CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Compact explanatory opinion summarization

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage is presumably highly sensitive to both the query and document.In this article, we present a new method for accurately detecting coherent relevant passages of variable lengths using hidden Markov models (HMMs). The HMM-based method naturally captures the topical boundaries between passages relevant and nonrelevant to the query. Pseudo-feedback mechanisms can be naturally incorporated into such an HMM-based framework to improve parameter estimation. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two datasets. We further show how the HMM method can be applied on top of any basic passage extraction method to improve passage boundaries.