Picture extraction from digitized historical manuscripts
Proceedings of the ACM International Conference on Image and Video Retrieval
Open world classification of printed invoices
Proceedings of the 10th ACM symposium on Document engineering
Automatic segmentation of digitalized historical manuscripts
Multimedia Tools and Applications
Hi-index | 0.00 |
This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendable.