Approximation algorithms for NP-hard problems
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
From single to multi-document summarization: a prototype system and its evaluation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Sentence Fusion for Multidocument News Summarization
Computational Linguistics
A formal model for information selection in multi-sentence text extraction
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Syntactic simplification for improving content selection in multi-document summarization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
Towards strict sentence intersection: decoding and evaluation strategies
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Hi-index | 0.00 |
The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional clustering techniques detect redundancy at the sentential level and do not guarantee the preservation of all information within the document. We discuss an algorithm that generates a novel graph-based representation for a document and then utilizes a set cover approximation algorithm to remove redundant text from it. Our experiments show that this approach offers a significant performance advantage over clustering when evaluated over an annotated dataset.