The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Self-Organizing Maps
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Imaged Document Text Retrieval Without OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Character Shape Coding for Information Retrieval
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Robust Retrieval of Noisy Text
ADL '96 Proceedings of the 3rd International Forum on Research and Technology Advances in Digital Libraries
Document Filtering for Fast Approximate String Matching of Errorneous Text
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Artificial Neural Networks for Document Analysis and Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Influence of fusion strategies on feature-based identification of low-resolution documents
Proceedings of the 2005 ACM symposium on Document engineering
Font Adaptive Word Indexing of Modern Printed Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Towards an omnilingual word retrieval system for ancient manuscripts
Pattern Recognition
A probabilistic method for keyword retrieval in handwritten document images
Pattern Recognition
Query driven word retrieval in graphical documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Translating handwritten bushman texts
Proceedings of the 10th annual joint conference on Digital libraries
Efficient word retrieval by means of SOM clustering and PCA
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Automatic keyword extraction from historical document images
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Exploring digital libraries with document image retrieval
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
This paper describes a system for efficient indexingand retrieval of words in collections of documen images. The proposed method is based on two main principles: unsupervised prototype clustering, and stringencoding for efficient string matching. During indexing, a self organizing map (SOM) is trained so as tocluster together similar symbols (character-like objects)in a sub-set of the documents to be stored. By using thetrained SOM the words in the whole collection can bestored and represented with a fixed-length description,that can be easily compared in order to score most similar words in response to a user query.The system can be automatically adapted to differentlanguages and fon styles. The most appropriate applications are for the processing of old documents (18th and 19th Centuries) where current OCRs have moredifficulties. Experimental results describe three application scenarios having various levels of difficulty for current OCR systems.