Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches

Authors:
Yllias Chali;Sadid a. Hasan
Affiliations:
University of lethbridge, lethbridge, alberta t1k 3m4, canada e-mail: chali@cs.uleth.ca, hasan@cs.uleth.ca;University of lethbridge, lethbridge, alberta t1k 3m4, canada e-mail: chali@cs.uleth.ca, hasan@cs.uleth.ca
Venue:
Natural Language Engineering
Year:
2012

Citing 33
Cited 0

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Support-Vector Networks

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Making large-scale support vector machine learning practical

Advances in kernel methods
The decomposition of human-written summary sentences

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Text summarization via hidden Markov models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Text classification using string kernels

The Journal of Machine Learning Research
Word sequence kernels

The Journal of Machine Learning Research
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Extracting important sentences with support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Answer mining from on-line documents

ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Bayesian query-focused summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Dependency-based sentence alignment for multiple document summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Topic-focused multi-document summarization using an approximate oracle score

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A SVM-Based Ensemble Approach to Multi-Document Summarization

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Extractive summarization using supervised and semi-supervised learning

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Selecting sentences for answering complex questions

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
UofL: word sense disambiguation using lexical cohesion

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Document summarization using conditional random fields

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Manifold-ranking based topic-focused multi-document summarization

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Complex question answering: unsupervised learning approaches and experiments

Journal of Artificial Intelligence Research
Graph-based multi-modality learning for topic-focused multi-document summarization

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Do automatic annotation techniques have any impact on supervised complex question answering?

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we apply different supervised learning techniques to build query-focused multi-document summarization systems, where the task is to produce automatic summaries in response to a given query or specific information request stated by the user. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time-consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element overlap, syntactic similarity measure, semantic similarity measure, and Extended String Subsequence Kernel. The supervised methods we use are Support Vector Machines, Conditional Random Fields, Hidden Markov Models, Maximum Entropy, and two ensemble-based approaches. During different experiments, we analyze the impact of automatic labeling methods on the performance of the applied supervised methods. To our knowledge, no other study has deeply investigated and compared the effects of using different automatic annotation techniques on different supervised learning approaches in the domain of query-focused multi-document summarization.