A clustering method and radius tuning by end users

  • Authors:
  • H. Takahashi;K. M. Mohiuddin

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe a top-down clustering method consisting of an intra class step and an inter class step. In the intra class step all the samples for each category are initially divided into a small number of clusters, then the largest cluster is split and its members reallocated. The largest cluster is decided based on a new concept, "Volume" of a cluster that is a hybrid of existing two common criteria for splitting: number of members in a cluster, and variance of a cluster. In the inter class step recognition is done for all the training set to assign best radius to each prototype. The radii are used as a normalizing factor in the computation of distance metrics. In our experiments we generated a prototype library by clustering characters written by Americans. When we used another training set written by Japanese only for tuning radii of the American library, the recognition rate of Japanese test set increased from 87.9% to 92.1%. The radii can be tuned even by OCR end users when the application domain is quite different from that of the initial clustering by OCR developers.