TINTIN: a system for retrieval in text tables
DL '97 Proceedings of the second ACM international conference on Digital libraries
Detecting Tables in HTML Documents
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Ontology Generation from Tables
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Identifying table boundaries in digital documents via sparse line detection
Proceedings of the 17th ACM conference on Information and knowledge management
XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents
ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
Improving the Table Boundary Detection in PDFs by Fixing the Sequence Error of the Sparse Lines
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Google fusion tables: data management, integration and collaboration in the cloud
Proceedings of the 1st ACM symposium on Cloud computing
Towards a common evaluation strategy for table structure recognition algorithms
Proceedings of the 10th ACM symposium on Document engineering
Ontology Generation from Web Tables: A 1+1+N Approach
IFITA '10 Proceedings of the 2010 International Forum on Information Technology and Applications - Volume 01
A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Table detection from plain text using machine learning and document structure
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
The lixto project: exploring new frontiers of web data extraction
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Notes on contemporary table recognition
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Endless and Scalable Knowledge Table Extraction from Semi-structured Websites
ICDMW '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops
PIKM 2013: the 6th ACM workshop for ph.d. students in information and knowledge management
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Tables are one of the common data presentation structures in documents. However, the task of automatic recognition and extraction of tables embedded in documents is still a significant challenge, and data contained within tables still remains under-utilised. Although some common steps can be defined for table extraction, there is no generic approach for table extraction tasks which can be applied to different sources and provide an end-to-end repeatable work-flow. This paper looks at the table extraction problem from the process point of view and proposes a table extraction workflow, which can be considered as a plug-and-play architecture for table extraction. Also, we present an overview of our complete system where the extracted tables are stored and managed. Table extraction is considered in the context of financial statements in this work, but the methods apply generally.