Combining summaries using unsupervised rank aggregation

Authors:
Girish Keshav Palshikar;Shailesh Deshpande;G. Athiappan
Affiliations:
Tata Research Development and Design Centre (TRDDC), Tata Consultancy Services Limited, Pune, India;Tata Research Development and Design Centre (TRDDC), Tata Consultancy Services Limited, Pune, India;Tata Research Development and Design Centre (TRDDC), Tata Consultancy Services Limited, Pune, India
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 17
Cited 0

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor

ADL '98 Proceedings of the Advances in Digital Libraries Conference
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structured and Unstructured Document Summarization: Design of a Commercial Summarizer using Lexical Chains

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Summarization evaluation using relative utility

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
A novel approach to semantic indexing based on concept

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Examining the consensus between human summaries: initial experiments with factoid analysis

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
An information-theoretic approach to automatic evaluation of summaries

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
PeRSSonal's core functionality evaluation: Enhancing text labeling through personalized summaries

Data & Knowledge Engineering
ManyAspects: a system for highlighting diverse concepts in documents

Proceedings of the VLDB Endowment
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Keyword extraction from a single document using centrality measures

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We model the problem of combining multiple summaries of a given document into a single summary in terms of the well-known rank aggregation problem. Treating sentences in the document as candidates and summarization algorithms as voters, we determine the winners in an election where each voter selects and ranks k candidates in order of its preference. Many rank aggregation algorithms are supervised: they discover an optimal rank aggregation function from a training dataset of where each "record" consists of a set of candidate rankings and a model ranking. But significant disagreements between model summaries created by human experts as well as high costs of creating them makes it interesting to explore the use of unsupervised rank aggregation techniques. We use the well-known Condorcet methodology, including a new variation to improve its suitability. As voters, we include summarization algorithms from literature and two new ones proposed here: the first is based on keywords and the second is a variant of the lexical-chain based algorithm in [1]. We experimentally demonstrate that the combined summary is often very similar (when compared using different measures) to the model summary produced manually by human experts.