Matching ottoman words: an image retrieval approach to historical document indexing

Authors:
Esra Ataer;Pinar Duygulu
Affiliations:
Bilkent University, Ankara, Turkey;Bilkent University, Ankara, Turkey
Venue:
Proceedings of the 6th ACM international conference on Image and video retrieval
Year:
2007

Citing 11
Cited 8

Survey and bibliography of Arabic optical text recognition

Signal Processing
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Modeling Scenes with Local Descriptors and Latent Aspects

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Offline Arabic Handwriting Recognition: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Searching Off-line Arabic Documents

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Computer Aided Indexing of Historical Manuscripts

CGIV '06 Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation
Retrieval of Ottoman documents

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Aligning transcripts to automatically segmented handwritten manuscripts

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Content-based retrieval of historical Ottoman documents stored as textual images

IEEE Transactions on Image Processing

Ottoman archives explorer: A retrieval system for digital Ottoman archives

Journal on Computing and Cultural Heritage (JOCCH)
Nearest neighbor based collection OCR

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A line-based representation for matching words in historical manuscripts

Pattern Recognition Letters
Improving handwritten keyword spotting with self-training

Proceedings of the 2011 ACM Symposium on Applied Computing
A novel word spotting algorithm using bidirectional long short-term memory neural networks

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Combining neural networks to improve performance of handwritten keyword spotting

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Synthesizing queries for handwritten word image retrieval

Pattern Recognition
Word spotting application in historical mongolian document images

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories

Quantified Score

Hi-index	0.01

Visualization

Abstract

Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words.