Improving Chinese/English OCR Performance by Using MCE-based Character-Pair Modeling and Negative Training

Authors:
Qiang Huo;Zhi-Dan Feng
Affiliations:
-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Year:
2003

Citing 9
Cited 2

A Survey of Methods and Strategies in Character Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Methodology for Gray-Scale Character Segmentation and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
On-Line Hand-Printing Recognition with Neural Networks

MICRONEURO '96 Proceedings of the 5th International Conference on Microelectronics for Neural Networks and Fuzzy Systems
Confidence Guided Progressive Search and Fast Match Techniques for High Performance Chinese/English OCR

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Integrated Segmentation and Recognition of Handwritten Numerals: Comparison of Classification Algorithms

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Handwritten Numeral String Recognition Using Neural Network Classifier Trained with Negative Data

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03
Improving rejection performance on handwritten digits by training with “rubbish”

Neural Computation
Handwritten word recognition with character and inter-character neural networks

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

An Approach to Extracting the Target Text Line from a Document Image Captured by a Pen Scanner

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Building compact MQDF classifier for large character set recognition by subspace distribution sharing

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past several years, we've been developing ahigh performance OCR engine for machine printed Chinese/English documents. We have reported previously (1)how to use character modeling techniques based on MCE(minimum classification error) training to achieve the highrecognition accuracy, and (2) how to use confidence-guidedprogressive search and fast match techniques to achieve thehigh recognition efficiency. In this paper, we present twomore techniques that help reduce search errors and improvethe robustness of our character recognizer. They are (1)to use MCE-trained character-pair models to avoid error-pronecharacter-level segmentation for some trouble cases,and (2) to perform a MCE-based negative training to improvethe rejection capability of the recognition models onthe hypothesized garbage images during recognition process.The efficacy of the proposed techniques is confirmedby experiments in a benchmark test.