Word segmentation of handwritten text using supervised classification techniques

  • Authors:
  • Yi Sun;Timothy S. Butler;Alex Shafarenko;Rod Adams;Martin Loomes;Neil Davey

  • Affiliations:
  • Department of Computer Science, Faculty of Engineering and Information Sciences, University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, UK;Department of Computer Science, Faculty of Engineering and Information Sciences, University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, UK;Department of Computer Science, Faculty of Engineering and Information Sciences, University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, UK;Department of Computer Science, Faculty of Engineering and Information Sciences, University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, UK;Department of Computer Science, Faculty of Engineering and Information Sciences, University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, UK;Department of Computer Science, Faculty of Engineering and Information Sciences, University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, UK

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent work on extracting features of gaps in handwritten text allows a classification of these gaps into inter-word and intra-word classes using suitable classification techniques. In this paper, we first analyse the features of the gaps using mutual information. We then investigate the underlying data distribution by using visualisation methods. These suggest that a complicated structure exists, which makes them difficult to be separated into two distinct classes. We apply five different supervised classification algorithms from the machine learning field on both the original dataset and a dataset with the best features selected using mutual information. Moreover, we improve the classification result with the aid of a set of feature variables of strokes preceding and following each gap. The classifiers are compared by employing McNemar's test. We find that SVMs and MLPs outperform the other classifiers and that preprocessing to select features works well. The best classification result attained suggests that the technique we employ is particularly suitable for digital ink manipulation at the level of words.