Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Generating summaries of multiple news articles
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English
Communications of the ACM
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Extractive summaries for educational science content
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Pedagogically useful extractive summaries for science education
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Correlation based multi-document summarization for scientific articles and news group
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Hi-index | 0.00 |
This paper describes our multi-document summarizer XDoX designed to summarize large sets of documents (50--500). These documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX identifies the most salient or often-repeated themes within the set and composes an extraction summary reflecting these main themes. The summarizer uses a unique n-gram scoring method to give greater importance to clusters of passages that have significant common phrases. Our methods are robust, topic-independent, and easily extensible to multilingual applications. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).