Extraction of relevant figures and tables for multi-document summarization

Authors:
Ashish Sadh;Amit Sahu;Devesh Srivastava;Ratna Sanyal;Sudip Sanyal
Affiliations:
Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 11
Cited 0

Evaluation challenges in large-scale document summarization

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The potential and limitations of automatic sentence extraction for summarization

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Automatic extraction of table metadata from digital documents

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Accessing bioscience images from abstract sentences

Bioinformatics
Deriving knowledge from figures for digital libraries

Proceedings of the 16th international conference on World Wide Web
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Evaluating automatically generated user-focused multi-document summaries for geo-referenced images

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Summarization from medical documents: a survey

Artificial Intelligence in Medicine
Generating synopses for document-element search

Proceedings of the 18th ACM conference on Information and knowledge management
Figure summarizer browser extensions for PubMed Central

Bioinformatics
Aggregation of multiple judgments for evaluating ordered lists

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators' ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.