Personalized text snippet extraction using statistical language models

Authors:
Qing Li;Yuanzhu Peter Chen
Affiliations:
School of Economic Information Engineering, Southwestern University of Finance and Economics, China;Department of Computer Science, Memorial University of Newfoundland, Canada
Venue:
Pattern Recognition
Year:
2010

Citing 20
Cited 1

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition and structuring

Information Processing and Management: an International Journal
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Language models for financial news recommendation

Proceedings of the ninth international conference on Information and knowledge management
Effective ranking with arbitrary passages

Journal of the American Society for Information Science and Technology
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval based on language models

Proceedings of the eleventh international conference on Information and knowledge management
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
From single to multi-document summarization: a prototype system and its evaluation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Passage selection to improve Question Answering

MultiSumQA '02 proceedings of the 2002 conference on multilingual summarization and question answering - Volume 19
Combining optimal clustering and Hidden Markov models for extractive summarization

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Relevance models for topic detection and tracking

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization

Arabic script web page language identifications using decision tree neural networks

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

In knowledge discovery in a text database, extracting and returning a subset of information highly relevant to a user's query is a critical task. In a broader sense, this is essentially identification of certain personalized patterns that drives such applications as Web search engine construction, customized text summarization and automated question answering. A related problem of text snippet extraction has been previously studied in information retrieval. In these studies, common strategies for extracting and presenting text snippets to meet user needs either process document fragments that have been delimitated a priori or use a sliding window of a fixed size to highlight the results. In this work, we argue that text snippet extraction can be generalized if the user's intention is better utilized. It overcomes the rigidness of existing approaches by dynamically returning more flexible start-end positions of text snippets, which are also semantically more coherent. This is achieved by constructing and using statistical language models which effectively capture the commonalities between a document and the user intention. Experiments indicate that our proposed solutions provide effective personalized information extraction services.