Video text recognition using sequential Monte Carlo and error voting methods

  • Authors:
  • Datong Chen;Jean-Marc Odobez

  • Affiliations:
  • IDIAP Research Institute, Rue du Simplon 4, 1920 Martigny, Valais, Switzerland;IDIAP Research Institute, Rue du Simplon 4, 1920 Martigny, Valais, Switzerland

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2005

Quantified Score

Hi-index 0.10

Visualization

Abstract

This paper addresses the issue of segmentation and recognition of text embedded in video sequences from their associated text image sequence extracted by a text detection module. To this end, we propose a probabilistic algorithm based on Bayesian adaptive thresholding and Monte-Carlo sampling. The algorithm approximates the posterior distribution of segmentation thresholds of text pixels in an image by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. One important contribution of the paper is to show that, thanks to the proposed methodology, the likelihood of a segmentation parameter sample can be estimated not using a classification criterion or a visual quality criterion based on the produced segmentation map, but directly from the induced text recognition result, which is directly relevant to our task. Furthermore, as a second contribution of the paper, we propose to align text recognition results from high confidence samples gathered over time, to composite a final result using error voting technique (ROVER) at the character level. Experiments are conducted on a two hour video database. Character recognition rates higher than 93%, and word error rates higher than 90% are achieved, which are 4% and 3% more than state-of-the-art methods applied to the same database.