Character confidence based on N-best list for keyword spotting in online Chinese handwritten documents

  • Authors:
  • Heng Zhang;Da-Han Wang;Cheng-Lin Liu

  • Affiliations:
  • -;-;-

  • Venue:
  • Pattern Recognition
  • Year:
  • 2014

Quantified Score

Hi-index 0.01

Visualization

Abstract

In keyword spotting from handwritten documents by text query, the word similarity is usually computed by combining character similarities, which are desired to approximate the logarithm of the character probabilities. In this paper, we propose to directly estimate the posterior probability (also called confidence) of candidate characters based on the N-best paths from the candidate segmentation-recognition lattice. On evaluating the candidate segmentation-recognition paths by combining multiple contexts, the scores of the N-best paths are transformed to posterior probabilities using soft-max. The parameter of soft-max (confidence parameter) is estimated from the character confusion network, which is constructed by aligning different paths using a string matching algorithm. The posterior probability of a candidate character is the summation of the probabilities of the paths that pass through the candidate character. We compare the proposed posterior probability estimation method with some reference methods including the word confidence measure and the text line recognition method. Experimental results of keyword spotting on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the effectiveness of the proposed method.