A Classification EM algorithm for clustering and two stochastic versions
Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine learning of generic and user-focused summarization
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Extracting sentence segments for text summarization: a machine learning approach
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Fast generation of abstracts from general domain text corpora by extracting relevant sentences
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
The automatic creation of literature abstracts
IBM Journal of Research and Development
Learning Classification with Both Labeled and Unlabeled Data
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Hi-index | 0.00 |
We describe an original machine learning approach for automatic text summarization; it works by extracting the most relevant sentences from a document. Since labeled corpora are difficult to collect for this task, we propose a semi-supervised method, which makes use of a small set of labeled sentences together with a large set of unlabeled documents, for improving the performances of summary systems. We show that this method is an instance of the Classification EM algorithm in the case of gaussian densities, and that it can also be used in a non-parametric setting. We finally provide an empirical evaluation on the Reuters news-wire corpus.