Content-based retrieval of historical Ottoman documents stored as textual images

Authors:
E. Saykol;A. K. Sinop;U. Gudukbay;O. Ulusoy;A. E. Cetin
Affiliations:
Dept. of Comput. Eng., Bilkent Univ., Ankara, Turkey;-;-;-;-
Venue:
IEEE Transactions on Image Processing
Year:
2004

Citing 0
Cited 13

Database research at Bilkent University

ACM SIGMOD Record
Financial Document Image Coding with Regions of Interest Using JPEG2000

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Extraction of Specified Objects from Binary Images Using Object Based Erosion Transform: Application to Hebrew Calligraphic Manuscripts

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Retrieval of Ottoman documents

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Matching ottoman words: an image retrieval approach to historical document indexing

Proceedings of the 6th ACM international conference on Image and video retrieval
Handwritten word-spotting using hidden Markov models and universal vocabularies

Pattern Recognition
Ottoman archives explorer: A retrieval system for digital Ottoman archives

Journal on Computing and Cultural Heritage (JOCCH)
Unsupervised writer adaptation of whole-word HMMs with application to word-spotting

Pattern Recognition Letters
A line-based representation for matching words in historical manuscripts

Pattern Recognition Letters
A web service platform for web-accessible archaeological databases

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
A database model for querying visual surveillance videos by integrating semantic and low-level features

MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
Learning-based word spotting system for Arabic handwritten documents

Pattern Recognition
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model

Image and Vision Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. A framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domains, based on angular and distance span of shapes, are used to extract the symbols. In order to make content-based retrieval in the historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in the textual images. The query process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.