Construction of model of structured documents based on machine learning
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Hi-index | 0.00 |
Intelligent Document Understanding (IDU) is the process of converting scanned document pages into an electronic, processable form. We have previously presented a IDU system architecture suitable for this task which uses a hybrid bottom-up/top-down control strategy. In this paper we focus on a specific subproblem that arises within the chosen framework, concerned with selecting an appropriate page layout structure. A detailed analysis of the problem using an error propagation model, allows computationally simple search strategies to be developed. A multistage layout formation algorithm is proposed and its performance is critically assessed when implemented using two different Layout Object selection criterion. The first selection criterion is based on a maximal column area coverage; the second is based on a probabilistic Layout Object selection. Both techniques have been incorporated into the hybrid IDU system and the results presented indicate its superiority over previously reported systems.