Layout and language: integrating spatial and linguistic knowledge for layout understanding tasks

Authors:
Matthew Hurst;Tetsuya Nasukawa
Affiliations:
IBM Research, Tokyo Research Laboratory;IBM Research, Tokyo Research Laboratory
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 3
Cited 4

Using White Space for Automated Document Structuring

Using White Space for Automated Document Structuring
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
Learning to recognize tables in free text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Learning table extraction from examples

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Information extraction from calls for papers with conditional random fields and layout features

Artificial Intelligence Review
Table detection in document images using header and trailer patterns

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex documents stored in a flat or partially marked up file format require layout sensitive preprocessing before any natural language processing can be carried out on their textual content. Contemporary technology for the discovery of basic textual units is based on either spatial or other content insensitive methods. However, there are many cases where knowledge of both the language and layout is required in order to establish the boundaries of the basic textual blocks. This paper describes a number of these cases and proposes the application of a general method combining knowledge about language with knowledge about the spatial arrangement of text. We claim that the comprehensive understanding of layout can only be achieved through the exploitation of layout knowledge and language knowledge in an inter-dependent manner.