Automatic extraction of table metadata from digital documents

Authors:
Ying Liu;Prasenjit Mitra;C. Lee Giles;Kun Bai
Affiliations:
Pennsylvania State University, University Park, Pennsylvania;Pennsylvania State University, University Park, Pennsylvania;Pennsylvania State University, University Park, Pennsylvania;Pennsylvania State University, University Park, Pennsylvania
Venue:
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Year:
2006

Citing 2
Cited 8

Recursive X-Y cut using bounding boxes of connected components

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Automatic searching of tables in digital libraries

Proceedings of the 16th international conference on World Wide Web
TableSeer: automatic table metadata extraction and searching in digital libraries

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
ChemXSeer: a digital library and data repository for chemical kinetics

Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Tablerank: a ranking algorithm for table search and retrieval

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
An Intelligent information segmentation approach to extract financial data for business valuation

Expert Systems with Applications: An International Journal
oreChem ChemXSeer: a semantic digital library for chemistry

Proceedings of the 10th annual joint conference on Digital libraries
Structure extraction from PDF-based book documents

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Extraction of relevant figures and tables for multi-document summarization

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and high-light a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and tested on PDF documents.