Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Detection, Extraction and Representation of Tables
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Learning to recognize tables in free text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Hi-index | 0.00 |
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a signifi- cant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.