Determination of the Script and Language Content of Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document image similarity and equivalence detection
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Probabilistic Retrieval of OCR Degraded Text Using N-Grams
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Extraction of Indicative Summary Sentences from Imaged Documents
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Duplicate Detection for Symbolically Compressed Documents
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Content-Based Indexing and Retrieval Method of Chinese Document Images
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Word Searching in Document Images Using Word Portion Matching
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Graphics Recognition - from Re-engineering to Retrieval
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Indexing and retrieval of words in old documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Information Retrieval in Document Image Databases
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Image Retrieval Based on Density Distribution Feature and Key Block Feature
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Font Adaptive Word Indexing of Modern Printed Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document image analysis for active reading
SADPI '07 Proceedings of the 2007 international workshop on Semantically aware document processing and indexing
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Text image matching without language model using a Hausdorff distance
Information Processing and Management: an International Journal
A word shape coding method for camera-based document images
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Feature string-based intelligent information retrieval from Tamil document images
International Journal of Computer Applications in Technology
Text retrieval from early printed books
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
A Document Image Retrieval System
Engineering Applications of Artificial Intelligence
PaperComp 2010: first international workshop on paper computing
Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing - Adjunct
A survey of keyword spotting techniques for printed document images
Artificial Intelligence Review
Keyword spotting on korean document images by matching the keyword image
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Efficient word retrieval by means of SOM clustering and PCA
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Exploring digital libraries with document image retrieval
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Amharic document image retrieval using morphological coding
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Near-duplicate document image matching: A graphical perspective
Pattern Recognition
Hi-index | 0.14 |
We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely, the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-gram based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from UW1 database confirms the validity of the proposed method.