Word spotting application in historical mongolian document images

Authors:
Hongxi Wei;Guanglai Gao
Affiliations:
School of Computer Science, Inner Mongolia University, Hohhot, China;School of Computer Science, Inner Mongolia University, Hohhot, China
Venue:
ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
Year:
2013

Citing 9
Cited 0

Indexing handwriting using word matching

Proceedings of the first ACM international conference on Digital libraries
Features for Word Spotting in Historical Manuscripts

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A Segmentation-free Approach for Keyword Search in Historical Typewritten Documents

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Eigenspace Method for Text Retrieval in Historical Document Images

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Matching ottoman words: an image retrieval approach to historical document indexing

Proceedings of the 6th ACM international conference on Image and video retrieval
A Method for Removing Inflectional Suffixes in Word Spotting of Mongolian Kanjur

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Classical Mongolian Words Recognition in Historical Document

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Efficient Cut-Off Threshold Estimation for Word Spotting Applications

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Efficient Word Retrieval Using a Multiple Ranking Combination Scheme

DAS '12 Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a framework based on the word spotting technology for indexing and retrieving the historical Mongolian document images. In the framework, the scanned document images are segmented into word images by some preprocessing steps such as binarization, connected component analysis and so on. And then each word image is processed by the following procedure, including removing inflectional suffixes, feature extraction and fixed-length representation. Finally, each word image is represented by a fixed-length feature vector and considered as an indexing term. At the retrieval stage, the necessary query keyword image can be obtained by synthesizing a sequence of glyphs according to the spelling rules of Mongolian language. For word matching, the query keyword image is also converted into a fixed-length feature vector through the same procedure. And a ranking list can be returned in descending order of similarities between the query keyword image and each candidate word image. Experimental results on the data set prove the feasibility and effectiveness of the proposed framework.