Retrieval by Layout Similarity of Documents Represented with MXY Trees

Authors:
Francesca Cesarini;Simone Marinai;Giovanni Soda
Affiliations:
-;-;-
Venue:
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Year:
2002

Citing 8
Cited 5

The indexing and retrieval of document images: a survey

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modern Information Retrieval

Modern Information Retrieval
Information Retrieval from Documents: A Survey

Information Retrieval
Comparison and Classification of Documents Based on Layout Similarity

Information Retrieval
Document image database retrieval and browsing using texture analysis

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Structured Document Segmentation and Representation by the Modified X-Y tree

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Encoding of Modified X-Y Trees for Document Classification

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

Graphics Recognition - from Re-engineering to Retrieval

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Using tree-grammars for training set expansion in page classification

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Layout based document image retrieval by means of XY tree reduction

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Information extraction efficiency of business documents captured with smartphones and tablets

Proceedings of the 2013 ACM symposium on Document engineering
Near-duplicate document image matching: A graphical perspective

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Document image retrieval can be carried out either processing the converted text (obtained with OCR) or by measuring the layout similarity of images. We describe a system for document image retrieval based on layout similarity. The layout is described by means of a tree-based representation: the Modified X-Y tree. Each page in the database is represented by a feature vector containing both global features of the page and a vectorial representation of its layout that is derived from the corresponding MXY tree. Occurrences of tree patterns are handled similarly to index terms in Information Retrieval in order to compute the similarity. When retrieving relevant documents, the images in the collection are sorted on the basis of a measure that is the combination of two values describing the similarity of global features and of the occurrences of tree patterns. The system is applied to the retrieval of documents belonging to digital libraries. Tests of the system are made on a data-set of more than 600 pages belonging to a journal of the 19th Century, and to a collection of monographs printed in the same Century and containing more than 600 pages.