Aggregation of multiple judgments for evaluating ordered lists

Authors:
Hyun Duk Kim;ChengXiang Zhai;Jiawei Han
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Year:
2010

Citing 17
Cited 2

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth

ICDE '01 Proceedings of the 17th International Conference on Data Engineering
Development and use of a gold-standard data set for subjectivity classifications

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Sentence ordering in multidocument summarization

HLT '01 Proceedings of the first international conference on Human language technology research
Probabilistic text structuring: experiments with sentence ordering

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
ProbFuse: a probabilistic approach to data fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A bottom-up approach to sentence ordering for multi-document summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving chronological sentence ordering by precedence relation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Automatic Evaluation of Information Ordering: Kendall's Tau

Computational Linguistics
Generative model-based metasearch for data fusion in information retrieval

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Analyzing disagreements

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Exploiting 'subjective' annotations

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Inferring strategies for sentence ordering in multidocument news summarization

Journal of Artificial Intelligence Research
Vote and aggregation in combinatorial domains with structured preferences

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A machine learning approach to sentence ordering for multidocument summarization and its evaluation

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Recent developments in information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Extraction of relevant figures and tables for multi-document summarization

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

Quantified Score

Hi-index	0.02

Visualization

Abstract

Many tasks (e.g., search and summarization) result in an ordered list of items. In order to evaluate such an ordered list of items, we need to compare it with an ideal ordered list created by a human expert for the same set of items. To reduce any bias, multiple human experts are often used to create multiple ideal ordered lists. An interesting challenge in such an evaluation method is thus how to aggregate these different ideal lists to compute a single score for an ordered list to be evaluated. In this paper, we propose three new methods for aggregating multiple order judgments to evaluate ordered lists: weighted correlation aggregation, rank-based aggregation, and frequent sequential pattern-based aggregation. Experiment results on ordering sentences for text summarization show that all the three new methods outperform the state of the art average correlation methods in terms of discriminativeness and robustness against noise. Among the three proposed methods, the frequent sequential pattern-based method performs the best due to the flexible modeling of agreements and disagreements among human experts at various levels of granularity.