Using White Space for Automated Document Structuring
Using White Space for Automated Document Structuring
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
Learning to recognize tables in free text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Learning table extraction from examples
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Information extraction from calls for papers with conditional random fields and layout features
Artificial Intelligence Review
Table detection in document images using header and trailer patterns
Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Hi-index | 0.00 |
Complex documents stored in a flat or partially marked up file format require layout sensitive preprocessing before any natural language processing can be carried out on their textual content. Contemporary technology for the discovery of basic textual units is based on either spatial or other content insensitive methods. However, there are many cases where knowledge of both the language and layout is required in order to establish the boundaries of the basic textual blocks. This paper describes a number of these cases and proposes the application of a general method combining knowledge about language with knowledge about the spatial arrangement of text. We claim that the comprehensive understanding of layout can only be achieved through the exploitation of layout knowledge and language knowledge in an inter-dependent manner.