Document Layout Structure Extraction Using Bounding Boxes of Different Entities

Authors:
Jisheng Liang;Jaekyu Ha;Robert M. Haralick;Ihsin T. Phillips
Affiliations:
-;-;-;-
Venue:
WACV '96 Proceedings of the 3rd IEEE Workshop on Applications of Computer Vision (WACV '96)
Year:
1996

Citing 0
Cited 5

An Optimization Methodology for Document Structure Extraction on Latin Character Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
A new table interpretation methodology with little knowledge base: table interpretation methodology

Proceedings of the 2006 ACM symposium on Applied computing
A table-form extraction with artefact removal

Proceedings of the 2007 ACM symposium on Applied computing
Spatial Relation Based Object Extraction from the World Wide Web

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Handwritten artefact identification method for table interpretation with little use of previous knowledge

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image. The algorithm segments an image into a list of homogeneous zones. The classification algorithm labels each zone as text, table, line-drawing, halftone, ruling, or noise. The text-lines and words are extracted within text zones and neighboring text-lines are merged to form text-blocks. The tabular structure is further decomposed into row and column items. Finally, the document layout hierarchy is produced from these extracted entities.