The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modern Information Retrieval
Information Retrieval from Documents: A Survey
Information Retrieval
Comparison and Classification of Documents Based on Layout Similarity
Information Retrieval
Document image database retrieval and browsing using texture analysis
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Structured Document Segmentation and Representation by the Modified X-Y tree
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Encoding of Modified X-Y Trees for Document Classification
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Graphics Recognition - from Re-engineering to Retrieval
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Using tree-grammars for training set expansion in page classification
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Layout based document image retrieval by means of XY tree reduction
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Information extraction efficiency of business documents captured with smartphones and tablets
Proceedings of the 2013 ACM symposium on Document engineering
Near-duplicate document image matching: A graphical perspective
Pattern Recognition
Hi-index | 0.01 |
Document image retrieval can be carried out either processing the converted text (obtained with OCR) or by measuring the layout similarity of images. We describe a system for document image retrieval based on layout similarity. The layout is described by means of a tree-based representation: the Modified X-Y tree. Each page in the database is represented by a feature vector containing both global features of the page and a vectorial representation of its layout that is derived from the corresponding MXY tree. Occurrences of tree patterns are handled similarly to index terms in Information Retrieval in order to compute the similarity. When retrieving relevant documents, the images in the collection are sorted on the basis of a measure that is the combination of two values describing the similarity of global features and of the occurrences of tree patterns. The system is applied to the retrieval of documents belonging to digital libraries. Tests of the system are made on a data-set of more than 600 pages belonging to a journal of the 19th Century, and to a collection of monographs printed in the same Century and containing more than 600 pages.