Correlating summarization of multi-source news with k-way graph bi-clustering

Authors:
Ya Zhang;Chao-Hsien Chu;Xiang Ji;Hongyuan Zha
Affiliations:
The Pennsylvania State University, PA;The Pennsylvania State University, PA;NEC Laboratories America, Cupertino, CA;The Pennsylvania State University, PA
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2004

Citing 15
Cited 7

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Information extraction

Communications of the ACM
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Summarizing Similarities and Differences Among Related Documents

Information Retrieval
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications

SIAM Journal on Scientific Computing
Evaluating strategies for similarity search on the web

Proceedings of the 11th international conference on World Wide Web
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-document summarization by sentence extraction

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
A common theory of information fusion from multiple text sources step one: cross-document structure

SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10
Automatic summarization of search engine hit lists

RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
Multi-document summarization by graph search and matching

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

A document-sensitive graph model for multi-document summarization

Knowledge and Information Systems
Navigating among search results: an information content approach

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
An efficient algorithm for enumerating pseudo cliques

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Summarizing the differences in multilingual news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Joint cluster based co-clustering for clustering ensembles

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Clustering-Based searching and navigation in an online news source

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Language independent query focused snippet generation

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics

Quantified Score

Hi-index	0.01

Visualization

Abstract

With the emergence of enormous amount of online news, it is desirable to construct text mining methods that can extract, compare and highlight similarities of them. In this paper, we explore the research issue and methodology of correlated summarization for a pair of news articles. The algorithm aligns the (sub)topics of the two news articles and summarizes their correlation by sentence extraction. A pair of news articles are modelled with a weighted bipartite graph. A mutual reinforcement principle is applied to identify a dense subgraph of the weighted bipartite graph. Sentences corresponding to the subgraph are correlated well in textual content and convey the dominant shared topic of the pair of news articles. As a further enhancement for lengthy articles, a k-way bi-clustering algorithm can first be used to partition the bipartite graph into several clusters, each containing sentences from the two news reports. These clusters correspond to shared subtopics, and the above mutual reinforcement principle can then be applied to extract topic sentences within each subtopic group.