TableSeer: automatic table metadata extraction and searching in digital libraries

Authors:
Ying Liu;Kun Bai;Prasenjit Mitra;C. Lee Giles
Affiliations:
The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA
Venue:
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Year:
2007

Citing 17
Cited 22

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
TINTIN: a system for retrieval in text tables

DL '97 Proceedings of the second ACM international conference on Digital libraries
A machine learning based approach for table detection on the web

Proceedings of the 11th international conference on World Wide Web
A retargetable table reader

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Detecting Tables in HTML Documents

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Recursive X-Y cut using bounding boxes of connected components

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Table Ground Truth Generation and a Background-Analysis-Based Table Structure Extraction Method

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Applying the T-Recs Table Recognition System to the Business Letter Domain

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Tabular abstraction, editing, and formatting

Tabular abstraction, editing, and formatting
Graph Grammar Based Analysis System of Complex Table Form Document

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A survey of table recognition: Models, observations, transformations, and inferences

International Journal on Document Analysis and Recognition
Learning to recognize tables in free text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using visual cues for extraction of tabular data from arbitrary HTML documents

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Automatic extraction of table metadata from digital documents

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Tablerank: a ranking algorithm for table search and retrieval

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1

ChemXSeer: a digital library and data repository for chemical kinetics

Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Identifying table boundaries in digital documents via sparse line detection

Proceedings of the 17th ACM conference on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
Automatically generating high quality metadata by analyzing the document code of common file types

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Whetting the appetite of scientists: producing summaries tailored to the citation context

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
SCOVO: Using Statistics on the Web of Data

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Test collection management and labeling system

Proceedings of the 9th ACM symposium on Document engineering
Tablerank: a ranking algorithm for table search and retrieval

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Generating synopses for document-element search

Proceedings of the 18th ACM conference on Information and knowledge management
Document retrieval using image features

Proceedings of the 2010 ACM Symposium on Applied Computing
Invited paper: Supporting browsing-specific information needs: Introducing the Citation-Sensitive In-Browser Summariser

Web Semantics: Science, Services and Agents on the World Wide Web
SeerSuite: developing a scalable and reliable application framework for building digital libraries by crawling the web

WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
An algorithm search engine for software developers

Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation
An efficient pre-processing method to identify logical components from PDF documents

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Summarizing figures, tables, and algorithms in scientific publications to augment search results

ACM Transactions on Information Systems (TOIS)
AckSeer: a repository and search engine for automatically extracted acknowledgments from digital libraries

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
A system for indexing tables, algorithms and figures

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Scientific table type classification in digital library

Proceedings of the 2012 ACM symposium on Document engineering
A Web-based resource model for scholarship 2.0: object reuse & exchange

Concurrency and Computation: Practice & Experience
A figure search engine architecture for a chemistry digital library

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Schema extraction for tabular data on the web

Proceedings of the VLDB Endowment
"Building a search engine for algorithms" by Suppawong Tuarob, Prasenjit Mitra, and C. Lee Giles with Martin Vesely as coordinator

ACM SIGWEB Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each ⃭query, tableℂ pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.