Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context

Authors:
Fei Yin;Qiu-Feng Wang;Cheng-Lin Liu
Affiliations:
-;-;-
Venue:
Pattern Recognition
Year:
2013

Citing 39
Cited 0

Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading

IEEE Transactions on Pattern Analysis and Machine Intelligence
The IRESTE On/Off (IRONOFF) Dual Handwriting Database

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Automatic Generation of Character Groundtruth for Scanned Documents: A Closed-Loop Approach

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Automatic Segmentation o the IAM Off-Line Database orHandwrittenEnglishText

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Incorporating Contextual Character Geometry in Word Recognition

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Transcript Mapping for Historic Handwritten Document Images

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Text Alignment with Handwritten Documents

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Holistic Word Recognition for Handwritten Historical Documents

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Construction of Handwriting Databases Using Transcript-Based Mapping

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Segmentation of Japanese Handwritten Characters Using Peripheral Feature Analysis

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Handwritten Numeral String Recognition: Character-Level vs. String-Level Classifier Training

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Document Image Ground Truth Generation from Electronic Text

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Effects of Classifier Structures and Training Regimes on Integrated Segmentation and Recognition of Handwritten Numeral Strings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Global Shape Normalization for Handwritten Chinese Character Recognition: A New Method

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Character Recognition Systems: A Guide for Students and Practitioners

Character Recognition Systems: A Guide for Students and Practitioners
Further explorations in text alignment with handwritten documents

International Journal on Document Analysis and Recognition
Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

International Journal on Document Analysis and Recognition
Online Handwritten Japanese Character String Recognition Incorporating Geometric Context

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Content-level Annotation of Large Collection of Printed Document Images

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Forty years of research in character and document recognition-an industrial perspective

Pattern Recognition
Off-line recognition of realistic Chinese handwriting using segmentation-free strategy

Pattern Recognition
Handwritten Chinese text line segmentation by clustering with distance metric learning

Pattern Recognition
Text line and word segmentation of handwritten documents

Pattern Recognition
Combining Alignment Results for Historical Handwritten Document Analysis

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A Tool for Ground-Truthing Text Lines and Characters in Off-Line Handwritten Chinese Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Multimodal interactive transcription of text images

Pattern Recognition
Recent advances in graph-based pattern recognition with applications in document analysis

Pattern Recognition
Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with Text-Image Alignment

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
Keyword Spotting from Online Chinese Handwritten Documents Using One-vs-All Trained Character Classifier

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
Improving Handwritten Chinese Text Recognition by Confidence Transformation

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
CASIA Online and Offline Chinese Handwriting Databases

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Aligning transcripts to automatically segmented handwritten manuscripts

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Arabic handwriting recognition using structural and syntactic pattern attributes

Pattern Recognition
Online and offline handwritten Chinese character recognition: Benchmarking on new databases

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Creating document image datasets with ground-truths of regions, text lines and characters is a prerequisite for document analysis research. However, ground-truthing large datasets is not only laborious and time consuming but also prone to errors due to the difficulty of character segmentation and the large variability of character shape, size and position. This paper describes an effective recognition-based annotation approach for ground-truthing handwritten Chinese documents. Under the Bayesian framework, the alignment of text line images with text transcript, which is the crucial step of annotation, is formulated as an optimization problem by incorporating geometric context of characters and character recognition model. We evaluated the alignment performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7356 classes and 5091 pages of unconstrained handwritten texts. The experimental results demonstrate the superiority of recognition-based text line alignment and the benefit of integrating geometric context. On a test set of 1015 handwritten pages (10,449 text lines), the proposed approach achieved character level alignment accuracy 92.32% when involving under-segmentation errors and 99.04% when excluding under-segmentation errors. The tool based on the proposed approach has been practically used for labeling handwritten Chinese documents.