Learning from collective human behavior to introduce diversity in lexical choice

Authors:
Vahed Qazvinian;Dragomir R. Radev
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 19
Cited 3

Attention, intentions, and the structure of discourse

Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On the bursty evolution of blogspace

WWW '03 Proceedings of the 12th international conference on World Wide Web
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Sentence Fusion for Multidocument News Summarization

Computational Linguistics
Manual and automatic evaluation of summaries

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Bootstrapping lexical choice via multiple-sequence alignment

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Examining the consensus between human summaries: initial experiments with factoid analysis

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Why we search: visualizing and predicting user behavior

Proceedings of the 16th international conference on World Wide Web
The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies

The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies
Meme-tracking and the dynamics of the news cycle

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Scientific paper summarization using citation summary networks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
DivRank: the interplay of prestige and diversity in information networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying non-explicit citing sentences for citation-based summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Summarizing contrastive viewpoints in opinionated text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Towards an ACL anthology corpus with logical document structure: an overview of the ACL 2012 contributed task

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Generating extractive summaries of scientific paradigms

Journal of Artificial Intelligence Research
The ACL anthology network corpus

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We analyze collective discourse, a collective human behavior in content generation, and show that it exhibits diversity, a property of general collective systems. Using extensive analysis, we propose a novel paradigm for designing summary generation systems that reflect the diversity of perspectives seen in reallife collective summarization. We analyze 50 sets of summaries written by human about the same story or artifact and investigate the diversity of perspectives across these summaries. We show how different summaries use various phrasal information units (i.e., nuggets) to express the same atomic semantic units, called factoids. Finally, we present a ranker that employs distributional similarities to build a network of words, and captures the diversity of perspectives by detecting communities in this network. Our experiments show how our system outperforms a wide range of other document ranking systems that leverage diversity.