Database research at Bilkent University
ACM SIGMOD Record
Financial Document Image Coding with Regions of Interest Using JPEG2000
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Retrieval of Ottoman documents
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Matching ottoman words: an image retrieval approach to historical document indexing
Proceedings of the 6th ACM international conference on Image and video retrieval
Handwritten word-spotting using hidden Markov models and universal vocabularies
Pattern Recognition
Ottoman archives explorer: A retrieval system for digital Ottoman archives
Journal on Computing and Cultural Heritage (JOCCH)
Unsupervised writer adaptation of whole-word HMMs with application to word-spotting
Pattern Recognition Letters
A line-based representation for matching words in historical manuscripts
Pattern Recognition Letters
A web service platform for web-accessible archaeological databases
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
Learning-based word spotting system for Arabic handwritten documents
Pattern Recognition
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model
Image and Vision Computing
Hi-index | 0.01 |
There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. A framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domains, based on angular and distance span of shapes, are used to extract the symbols. In order to make content-based retrieval in the historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in the textual images. The query process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.