The use of unlabeled data to improve supervised learning for text summarization

Authors:
Massih-Reza Amini;Patrick Gallinari
Affiliations:
University of Pierre and Marie Curie, Paris, France;University of Pierre and Marie Curie, Paris, France
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 13
Cited 36

A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Machine learning of generic and user-focused summarization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Selecting text spans for document summaries: heuristics and metrics

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
OCELOT: a system for summarizing Web pages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Extracting sentence segments for text summarization: a machine learning approach

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A new approach to unsupervised text summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
Fast generation of abstracts from general domain text corpora by extracting relevant sentences

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2

Learning Classification with Both Labeled and Unlabeled Data

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Content-aware search of multimedia data in ad hoc networks

MSWiM '05 Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems
Learning to summarise XML documents using content and structure

Proceedings of the 14th ACM international conference on Information and knowledge management
Semi-supervised learning with an imperfect supervisor

Knowledge and Information Systems
Learning-based summarisation of XML documents

Information Retrieval
CollabSum: exploiting multiple document clustering for collaborative single document summarizations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
User-model based personalized summarization

Information Processing and Management: an International Journal
Learning query-biased web page summarization

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A boosting algorithm for learning bipartite ranking functions with partially labeled data

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Class structure visualization with semi-supervised growing self-organizing maps

Neurocomputing
Generic Summarization Using Non-negative Semantic Variable

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
An extension of PLSA for document clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Automatic generic document summarization based on non-negative matrix factorization

Information Processing and Management: an International Journal
Evaluation of the Effects of User-Sensitivity on Text Summarization

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Estimating Risk of Picking a Sentence for Document Summarization

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Single document summarization with document expansion

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Semi-supervised learning with explicit misclassification modeling

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)
Improving document clustering in a learned concept space

Information Processing and Management: an International Journal
Semi-supervised document classification with a mislabeling error model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
EUSUM: extracting easy-to-understand english summaries for non-native readers

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Metadata-aware measures for answer summarization in community Question Answering

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross-language document summarization based on machine translation quality prediction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Digital learning for summarizing Arabic documents

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Applying wikipedia-based explicit semantic analysis for query-biased document summarization

ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
Learning aspect models with partially labeled data

Pattern Recognition Letters
Using bilingual information for cross-language document summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Summarizing the differences in multilingual news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An investigation concerning the generation of text summarisation classifiers using secondary data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Incorporating cross-document relationships between sentences for single document summarizations

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Automated retraining methods for document classification and their parameter tuning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Automatic text summarization based on word-clusters and ranking algorithms

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Semi-supervised learning of dynamic self-organising maps

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Clustering tagged documents with labeled and unlabeled documents

Information Processing and Management: an International Journal
Transferring knowledge with source selection to learn IR functions on unlabeled collections

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.