An Optimization Methodology for Document Structure Extraction on Latin Character Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
A new table interpretation methodology with little knowledge base: table interpretation methodology
Proceedings of the 2006 ACM symposium on Applied computing
A table-form extraction with artefact removal
Proceedings of the 2007 ACM symposium on Applied computing
Spatial Relation Based Object Extraction from the World Wide Web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
This paper presents an efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image. The algorithm segments an image into a list of homogeneous zones. The classification algorithm labels each zone as text, table, line-drawing, halftone, ruling, or noise. The text-lines and words are extracted within text zones and neighboring text-lines are merged to form text-blocks. The tabular structure is further decomposed into row and column items. Finally, the document layout hierarchy is produced from these extracted entities.