Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
TINTIN: a system for retrieval in text tables
DL '97 Proceedings of the second ACM international conference on Digital libraries
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Detecting Tables in HTML Documents
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Recursive X-Y cut using bounding boxes of connected components
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Applying the T-Recs Table Recognition System to the Business Letter Domain
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Tabular abstraction, editing, and formatting
Tabular abstraction, editing, and formatting
Graph Grammar Based Analysis System of Complex Table Form Document
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A survey of table recognition: Models, observations, transformations, and inferences
International Journal on Document Analysis and Recognition
Learning to recognize tables in free text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using visual cues for extraction of tabular data from arbitrary HTML documents
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Automatic extraction of table metadata from digital documents
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Tablerank: a ranking algorithm for table search and retrieval
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
ChemXSeer: a digital library and data repository for chemical kinetics
Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Identifying table boundaries in digital documents via sparse line detection
Proceedings of the 17th ACM conference on Information and knowledge management
Foundations and Trends in Databases
Automatically generating high quality metadata by analyzing the document code of common file types
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Whetting the appetite of scientists: producing summaries tailored to the citation context
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
SCOVO: Using Statistics on the Web of Data
ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Test collection management and labeling system
Proceedings of the 9th ACM symposium on Document engineering
Tablerank: a ranking algorithm for table search and retrieval
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Generating synopses for document-element search
Proceedings of the 18th ACM conference on Information and knowledge management
Document retrieval using image features
Proceedings of the 2010 ACM Symposium on Applied Computing
Web Semantics: Science, Services and Agents on the World Wide Web
WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
An algorithm search engine for software developers
Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation
An efficient pre-processing method to identify logical components from PDF documents
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Summarizing figures, tables, and algorithms in scientific publications to augment search results
ACM Transactions on Information Systems (TOIS)
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
A system for indexing tables, algorithms and figures
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Scientific table type classification in digital library
Proceedings of the 2012 ACM symposium on Document engineering
A Web-based resource model for scholarship 2.0: object reuse & exchange
Concurrency and Computation: Practice & Experience
A figure search engine architecture for a chemistry digital library
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Schema extraction for tabular data on the web
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each ⃭query, tableℂ pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.