The Diagonal Split: A Pre-segmentation Step for Page Layout Analysis and Classification
IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Relational indexing of vectorial primitives for symbol spotting in line-drawing images
Pattern Recognition Letters
Scalable indexing for layout based document retrieval and ranking
Proceedings of the 2010 ACM Symposium on Applied Computing
Quality assurance for document image collections in digital preservation
ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
An expert system for quality assurance of document image collections
EuroMed'12 Proceedings of the 4th international conference on Progress in Cultural Heritage Preservation
Document page retrieval based on geometric layout features
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Duplicate detection approaches for quality assurance of document image collections
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Near-duplicate document image matching: A graphical perspective
Pattern Recognition
Hi-index | 0.00 |
Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrieval instead. A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure: First, the distances between the blocks of two layouts are calculated. Then, the blocks of one layout are assigned to the blocks of the other layout in a matching step. Different block distances and matching methods are compared and evaluated using the publicly available MARG database. On this dataset, the layout type can be determined successfully in 92.6% of the cases using the best distance measure in a nearest neighbor classifier. The experiments show that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching.