Imaged Document Text Retrieval Without OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval from Documents: A Survey
Information Retrieval
Comparison and Classification of Documents Based on Layout Similarity
Information Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Image Analysis Using a New Compression Algorithm
DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Large scalability in document image matching using text retrieval
Pattern Recognition Letters
Hi-index | 0.01 |
A hierarchical algorithm is presented for determining the similarity and equivalence of document images. Features extracted from the CCITT fax compressed representations of two images are compared to determine their visual similarity and whether they are equivalent. Pass codes in the compressed data are used as features. A fixed grid is imposed on the image and a feature vector is derived from the number of pass codes in each grid cell. The feature vectors are compared to locate a group of documents that are visually similar to the input image. The equivalence of two documents is determined by applying the Hausdorff distance to the two dimensional arrangement of pass codes in small patches of each image.