Geometric Structure Analysis of Document Images: A Knowledge-Based Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
We propose a new approach to document image layout extraction using rapid feature analysis, preclassification and predictive coding. First , a set of layout features is used to render the image profile information. The knowledge base is utilized to rule these early regions into layout labels. The regions found are given a classification tag and a degree of membership into background, text, picture and linedrawing classes. A predictive coding method is used with the preclassification information to rise the confidence of each label, and to integrate the regional domain and the labels into a uniform class without any shape assumption. We have tested our technique using three different databases that comprise over 1000 document images. The results show high degree of confidence in region separation and extraction. The main benefits include robust classification, shape independency and rapid computation.