Scientific table type classification in digital library
Proceedings of the 2012 ACM symposium on Document engineering
Hi-index | 0.00 |
Better understanding the document logical components is crucial to many applications, e.g., document classification or data integration. As the development of digital libraries, more people realize the importance of the scientific tables, which contain valuable information concisely. Although tons of previous table works focus on table data extraction, few concrete works on understanding and utilizing the extracted table data exist. Based on a large-scaled quantitative study on scientific papers, we believe that identifying the original purpose of the table authors can improve the table data comprehension and facilitate the table data reusability. In this paper, scientific document tables are classified into three topical categories: background, system/method, and experimental, and two functional categories: commentary and comparison. We apply machine learning based methods to implement the table classification task. Our results demonstrate that the proposed features are effective in the classification performance and our proposed method outperforms the rule-based baseline significantly.