Integrating Document Clustering and Multidocument Summarization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Studying software evolution using artefacts' shared information content
Science of Computer Programming
Information distance and its extensions
DS'11 Proceedings of the 14th international conference on Discovery science
Hi-index | 0.00 |
Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper described a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC 2007 dataset and the TAC 2008 dataset have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.