Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech and Language
Performance analysis of digital transmission systems
Performance analysis of digital transmission systems
Vector quantization and signal compression
Vector quantization and signal compression
Fundamentals of speech recognition
Fundamentals of speech recognition
HMM Based On-Line Handwriting Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document image similarity and equivalence detection
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Document image database retrieval and browsing using texture analysis
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
On-line Handwritten Signature Verification using Hidden Markov Model Features
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Automatic Knowledge Acquisition for Spatial Document Interpretation
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
The Retrieval of Document Images: A Brief Survey
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
The Detection of Duplicates in Document Image Databases
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Image Organization and Retrieval Using a Flexible Shape Model
CAIVD '98 Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases (CAIVD '98)
Classification and functional decomposition of business documents
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Retrieval by Layout Similarity of Documents Represented with MXY Trees
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Layout based document image retrieval by means of XY tree reduction
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Document visual similarity measure for document search
Proceedings of the 11th ACM symposium on Document engineering
Exploring digital libraries with document image retrieval
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Multi-level sequence alignment: a trade-off between speed and accuracy in similar text searching
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Near-duplicate document image matching: A graphical perspective
Pattern Recognition
Hi-index | 0.00 |
This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. These fixed-length vectors are then compared to each other through a Manhattan distance computation for fast page layout comparison. The paper describes experiments and results to rank-order a set of document pages in terms of their layout similarity to a test document. We also demonstrate the usefulness of the features derived from interval coding in a hidden Markov model based page layout classification system that is trainable and extendible. The methods described in the paper can be used in various document retrieval tasks including visual similarity based retrieval, categorization and information extraction.