Extractive summarization using supervised and semi-supervised learning

Authors:
Kam-Fai Wong;Mingli Wu;Wenjie Li
Affiliations:
The Chinese University of Hong Kong, New Territories, Hong Kong;The Chinese University of Hong Kong, New Territories, Hong Kong and The Hong Kong Polytechnic University, Kowloon, Hong Kong;The Hong Kong Polytechnic University, Kowloon, Hong Kong
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 11
Cited 22

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The rhetorical parsing of natural language texts

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Probability Estimates for Multi-class Classification by Pairwise Coupling

The Journal of Machine Learning Research
Applying Co-Training to reference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Applying machine learning to Chinese temporal relation resolution

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extractive summarization using inter- and intra- event relevance

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Investigations on event-based summarization

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization

ACM Transactions on Asian Language Information Processing (TALIP)
Graph-based multi-modality learning for topic-focused multi-document summarization

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Multi-document summarisation using generic relation extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
EUSUM: extracting easy-to-understand english summaries for non-native readers

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Metadata-aware measures for answer summarization in community Question Answering

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross-language document summarization based on machine translation quality prediction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A new approach to improving multilingual summarization using a genetic algorithm

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Towards a unified approach to simultaneous single-document and multi-document summarizations

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Towards a framework for abstractive summarization of multimodal documents

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Using bilingual information for cross-language document summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Summarizing the differences in multilingual news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
COMPENDIUM: a text summarization system for generating abstracts of research papers

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
iDVS: an interactive multi-document visual summarization system

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Text summarisation in progress: a literature review

Artificial Intelligence Review
Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches

Natural Language Engineering
AUSUM: approach for unsupervised bug report summarization

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Exploring hypergraph-based semi-supervised ranking for query-oriented summarization

Information Sciences: an International Journal
Summarization of legal texts with high cohesion and automatic compression rate

JSAI-isAI'12 Proceedings of the 2012 international conference on New Frontiers in Artificial Intelligence
Cross-lingual training of summarization systems using annotated corpora in a foreign language

Information Retrieval
Editorial: COMPENDIUM: A text summarization system for generating abstracts of research papers

Data & Knowledge Engineering
Extractive single-document summarization based on genetic operators and guided local search

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is difficult to identify sentence importance from a single point of view. In this paper, we propose a learning-based approach to combine various sentence features. They are categorized as surface, content, relevance and event features. Surface features are related to extrinsic aspects of a sentence. Content features measure a sentence based on content-conveying words. Event features represent sentences by events they contained. Relevance features evaluate a sentence from its relatedness with other sentences. Experiments show that the combined features improved summarization performance significantly. Although the evaluation results are encouraging, supervised learning approach requires much labeled data. Therefore we investigate co-training by combining labeled and unlabeled data. Experiments show that this semi-supervised learning approach achieves comparable performance to its supervised counterpart and saves about half of the labeling time cost.