Learning for Text Summarization Using Labeled and Unlabeled Sentences

Authors:
Massih-Reza Amini;Patrick Gallinari
Affiliations:
-;-
Venue:
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Year:
2001

Citing 7
Cited 1

A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine learning of generic and user-focused summarization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Extracting sentence segments for text summarization: a machine learning approach

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Fast generation of abstracts from general domain text corpora by extracting relevant sentences

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
The automatic creation of literature abstracts

IBM Journal of Research and Development

Learning Classification with Both Labeled and Unlabeled Data

ECML '02 Proceedings of the 13th European Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an original machine learning approach for automatic text summarization; it works by extracting the most relevant sentences from a document. Since labeled corpora are difficult to collect for this task, we propose a semi-supervised method, which makes use of a small set of labeled sentences together with a large set of unlabeled documents, for improving the performances of summary systems. We show that this method is an instance of the Classification EM algorithm in the case of gaussian densities, and that it can also be used in a non-parametric setting. We finally provide an empirical evaluation on the Reuters news-wire corpus.