Fast generation of abstracts from general domain text corpora by extracting relevant sentences

Authors:
Klaus Zechner
Affiliations:
Carnegie Mellon University, Pittsburgh, PA
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 8
Cited 36

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
SCISOR: extracting information from on-line news

Communications of the ACM
Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A full-text retrieval system with a dynamic abstract generation function

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction: automatic summarizing

Information Processing and Management: an International Journal - Special issue: summarizing text
Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
New Methods in Automatic Extracting

Journal of the ACM (JACM)

Automatically summarising Web sites: is there a way around it?

Proceedings of the ninth international conference on Information and knowledge management
A new approach to unsupervised text summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generic summaries for indexing in information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A new method for selecting English field association terms of compound words and its knowledge representation

Information Processing and Management: an International Journal
Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Learning for Text Summarization Using Labeled and Unlabeled Sentences

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
The diversity-based approach to open-domain text summarization

Information Processing and Management: an International Journal
An automatic extraction of key paragraphs based on context dependency

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Evaluation of importance of sentences based on connectivity to title

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Toward the "at-a-glance" summary: phrase-representation summarization method

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
CAST: a computer-aided summarisation tool

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Extracting important sentences with support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Supervised ranking in open-domain text summarization

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Extracting key paragraph based on topic and event detection: towards multi-document summarization

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Evaluation of phrase-representation summarization based on information retrieval task

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Combining optimal clustering and Hidden Markov models for extractive summarization

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
An evolutionary approach for improving the quality of automatic summaries

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Examining the consensus between human summaries: initial experiments with factoid analysis

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
A system for query-specific document summarization

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Paragraph-, word-, and coherence-based approaches to sentence ranking: a comparison of algorithm and human performance

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Corpus and evaluation measures for multiple document summarization with multiple sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using gene expression programming to construct sentence ranking functions for text summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An effective sentence-extraction technique using contextual information and statistical approaches for text summarization

Pattern Recognition Letters
Pseudo-relevance feedback and statistical query expansion for web snippet generation

Information Processing Letters
Adaptive Web SitesA Knowledge Extraction from Web Data Approach

Proceedings of the 2008 conference on Adaptive Web Sites: A Knowledge Extraction from Web Data Approach
Extracting key paragraph based on topic and event detection: towards multi-document summarization

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Evaluation of phrase-representation summarization based on information retrieval task

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Corpus-based Chinese-Korean abstracting translation system

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Improving XML search by generating and utilizing informative result snippets

ACM Transactions on Database Systems (TODS)
Constructing query-biased summaries: a comparison of human and system generated snippets

Proceedings of the third symposium on Information interaction in context
Automatic text summarization based on word-clusters and ranking algorithms

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Automatic text summarization using two-step sentence extraction

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Why read if you can skim: towards enabling faster screen reading

Proceedings of the International Cross-Disciplinary Conference on Web Accessibility
Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling

Information Processing and Management: an International Journal
Accessible skimming: faster screen reading of web pages

Proceedings of the 25th annual ACM symposium on User interface software and technology

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper describes a system for generating text abstracts which relies on a general, purely statistical principle, i.e., on the notion of "relevance", as it is defined in terms of the combination of feild weights of words in a sentence. The system generates abstracts from newspaper articles by selecting the "most relevant" sentences and combining them in text order. Since neither domain knowledge nor text-sort-specific heuristics are involved, this system provides maximal generality and flexibility. Also, it is fast and can be efficiently implemented for both on-line and off-line purposes. An experiment shows that recall and precision for the extracted sentences (taking the sentences extracted by human subjects as a baseline) is within the same range as recall/precision when the human subjects are compared amongst each other: this means in fact that the performance of the system is indistinguishable from the performance of a human abstractor. Finally, the system yields significantly better results than a default "lead" algorithm does which chooses just some initial sentences from the text.