Topic-Driven Multi-document Summarization

Authors:
Hongling Wang;Guodong Zhou
Affiliations:
-;-
Venue:
IALP '10 Proceedings of the 2010 International Conference on Asian Language Processing
Year:
2010

Citing 0
Cited 2

WikiSent: weakly supervised sentiment analysis through extractive summarization with wikipedia

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Research on chinese sentence compression for the title generation

CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a topic-driven framework for generating a generic summary from multi-documents. Our approach is based on the intuition that, from the statistical point of view, the summary’s probability distribution over the topics should be consistent with the multi-documents’ probability distribution over the inherent topics. Here, the topics are defined as weighted “bag-of-words” and derived by Latent Dirichlet Allocation from a collection of documents, either the given multi-documents or a related large-scale corpus. In this sense, we could represent various kinds of text units, such as word, sentence, summary, document and multi-documents, using a single vector space model via their corresponding probability distributions over the derived topics. Therefore, we are able to extract a sentence or summary by calculating the similarity between a sentence/summary and the given multi-documents via their topic probability distributions. In particular, we propose two methods in similarity measurement: the static method and the dynamic method. While the former is employed to detect the salience of information in a static way, the later further controls redundancy in a dynamic way. In addition, we integrate various popular features to improve the performance. Evaluation on the TAC 2008 update summarization task shows encouraging results.