Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization

Authors:
Rachit Arora;Balaraman Ravindran
Affiliations:
-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 6

Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Toward a Unified Framework for Standard and Update Multi-Document Summarization

ACM Transactions on Asian Language Information Processing (TALIP)
Combining syntax and semantics for automatic extractive single-document summarization

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Modelling sequential text with an adaptive topic model

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
TopicDSDR: combining topic decomposition and data reconstruction for summarization

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.