A Semi-supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort

Authors:
Szilárd Vajda;Akmal Junaidi;Gernot A. Fink
Affiliations:
-;-;-
Venue:
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Year:
2011

Citing 0
Cited 3

Lampung - a new handwritten character benchmark: database, labeling and recognition

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

Pattern Recognition Letters
Semi-supervised learning for character recognition in historical archive documents

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the major issues in handwritten character recognition is the efficient creation of ground truth to train and test the different recognizers. The manual labeling of the data by a human expert is a tedious and costly procedure. In this paper we propose an efficient and low-cost semi-automatic labeling system for character datasets. First, the data is represented in different abstraction levels, which is clustered after in an unsupervised manner. The different clusters are labeled by the human experts and finally an unanimity voting is considered to decide if a label is accepted or not. The experimental results prove that labeling only less than 0.5% of the training data is sufficient to achieve 86.21% recognition rate for a brand new script (Lampung) and 94.81% for the MNIST benchmark dataset, considering only a K-nearest neighbor classifier for recognition.