Keyword spotting in unconstrained handwritten Chinese documents using contextual word model

Authors:
Liang Huang;Fei Yin;Qing-Hu Chen;Cheng-Lin Liu
Affiliations:
School of Electronic Information, Wuhan University, 39 Luoyu Road, Wuhan, Hubei 430079, PR China;National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, PR China;School of Electronic Information, Wuhan University, 39 Luoyu Road, Wuhan, Hubei 430079, PR China;National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, PR China
Venue:
Image and Vision Computing
Year:
2013

Citing 36
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
The indexing and retrieval of document images: a survey

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Information Retrieval from Documents: A Survey

Information Retrieval
Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Probabilistic Approach to Confidence Estimation and Evaluation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Document Image Retrieval Method Tolerating Recognition and Segmentation Errors of OCR Using Shape-Feature and Multiple Candidates

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Content-Based Indexing and Retrieval Method of Chinese Document Images

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Document Retrieval Method from Handwritten Characters Based on OCR and Character Shape Information

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
A search engine for historical manuscript images

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Document Retrieval System Tolerant of Segmentation Errors of Document Images

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
A Search Method for On-Line Handwritten Text Employing Writing-Box-Free Handwriting Recognition

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Normalization-Cooperated Gradient Feature Extraction for Handwritten Character Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text search for medieval manuscript images

Pattern Recognition
Document Image Retrieval through Word Shape Coding

IEEE Transactions on Pattern Analysis and Machine Intelligence
Retrieval of online handwriting by synthesis and matching

Pattern Recognition
A Novel Connectionist System for Unconstrained Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Handwritten word-spotting using hidden Markov models and universal vocabularies

Pattern Recognition
Handwritten Chinese text line segmentation by clustering with distance metric learning

Pattern Recognition
Finding words in alphabet soup: Inference on freeform character recognition for historical scripts

Pattern Recognition
A probabilistic method for keyword retrieval in handwritten document images

Pattern Recognition
Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition

Pattern Recognition
Regularized margin-based conditional log-likelihood loss for prototype learning

Pattern Recognition
One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Keyword Spotting from Online Chinese Handwritten Documents Using One-vs-All Trained Character Classifier

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
A word spotting framework for historical machine-printed documents

International Journal on Document Analysis and Recognition - Special issue on noisy text analytics
CASIA Online and Offline Chinese Handwriting Databases

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Keyword Spotting in Offline Chinese Handwritten Documents Using a Statistical Model

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
A Novel Word Spotting Method Based on Recurrent Neural Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Retrieval of chinese calligraphic character image

PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part I
Word spotting in historical printed documents using shape and sequence comparisons

Pattern Recognition
Lexicon-free handwritten word spotting using character HMMs

Pattern Recognition Letters
Handwritten Chinese Text Recognition by Integrating Multiple Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-based retrieval of historical Ottoman documents stored as textual images

IEEE Transactions on Image Processing
Successive overrelaxation for support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method for keyword spotting in off-line Chinese handwritten documents using a contextual word model, which measures the similarity between the query word and every candidate word in the document by combining a character classifier and the geometric context as well as linguistic context. The geometric context model characterizes the single-character likeliness and between-character relationship. The linguistic model utilizes the dependency of the word with the external adjacent characters. The combining weights are optimized on training documents. Experiments on a large handwriting database CASIA-HWDB demonstrate the effectiveness of the proposed method and justify the benefits of geometric and linguistic contexts. Compared to transcription-based text search, the proposed method can provide higher recall rate, and for spotting words of four characters, the proposed method provides both higher precision and recall rate.