Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Comparison and Classification of Documents Based on Layout Similarity
Information Retrieval
ATreeGrep: Approximate Searching in Unordered Trees
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Retrieval by Layout Similarity of Documents Represented with MXY Trees
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Using tree-grammars for training set expansion in page classification
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Font Adaptive Word Indexing of Modern Printed Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Exploring digital libraries with document image retrieval
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Near-duplicate document image matching: A graphical perspective
Pattern Recognition
Hi-index | 0.00 |
We analyze a system for the retrieval of document images on the basis of layout similarity. Layout objects are extracted and represented with the XY tree. Page similarity is computed with a tree-edit distance algorithm. The peculiarity of the approach is the use of tree grammars to model the variations in the tree which are due to segmentation algorithms or to structural differences between documents with similar layout. A few class-independent grammatical rules are used to modify each tree and obtain a reduced tree that is supposed to preserve the most relevant features of the page.