Probability Table Compression for Handwritten Character Recognition

  • Authors:
  • Sung-Jung Cho;Michael Perrone;Eugene Ratzlaff

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new probability table memorycompression method based on mixture models and itsapplication to N-tuple recognizers and N-gram character language models. Joint probability tables are decomposed into lower dimensional probability componentsand their mixtures. The maximum likelihood parameters of the mixture models are trained by the Expectation Maximization (EM) algorithm and quantized toone byte integers. Probability elements that mixturemodels do not estimate reliably are kept separately. Experimental results with on-line handwritten UNIPENuppercase and lowercase characters show that the total memory size of an on-line scanning N-tuple recognizer is reduced from 12.3MB to 0.66MB bytes, whilethe recognition rate drops from 91.64% to 91.13% foruppercase characters and from 88.44% to 87.31% forlowercase characters. The N-gram character languagemodel was compressed from 73.6MB to 0.58MB withminimal reduction in performance.