New Feature Sets for Summarization by Sentence Extraction

Authors:
Hans van Halteren
Affiliations:
-
Venue:
IEEE Intelligent Systems
Year:
2003

Citing 10
Cited 5

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training a selection function for extraction

Proceedings of the eighth international conference on Information and knowledge management
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Information Retrieval

Information Retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Introduction to the special issue on summarization

Computational Linguistics - Summarization
Identifying topics by position

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Query-relevant summarization using FAQs

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A default first order family weight determination procedure for WPDV models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Linguistic profiling of texts for the purpose of language verification

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using position, fonts and cited references to retrieve scientific documents

Journal of Information Science
Focused multi-document summarization: human summarization activity vs. automated systems techniques

Journal of Computing Sciences in Colleges
Summary of FAQs from a topical forum based on the native composition structure

Expert Systems with Applications: An International Journal
Intelligent financial news digest system

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning feature sets that were originally developed for authorship attribution can be used for summarization by sentence extraction. In the author's pilot experiment, these feature sets distinguished significantly better between extract and nonextract sentences than a random baseline classifier, but it had to be carefully combined with other features to outperform a positional baseline classifier. In the DUC 2002 competition, an actual combination system trained on 400-word single document extracts was one of the best performers in the 200- and 400-word multidocument extraction task. Further experiments showed that this system could be improved significantly with training material that better reflected the intended task.