Occluded text restoration and recognition

  • Authors:
  • Lanlan Chang;Jun Sun;Misako Suwa;Hiroaki Takebe;Yuan He;Satoshi Naoi

  • Affiliations:
  • Fujitsu R & D Center, Beijing, P.R. China;Fujitsu R & D Center, Beijing, P.R. China;Fujitsu Laboratories, Kawasaki, Japan;Fujitsu Laboratories, Kawasaki, Japan;Fujitsu R & D Center, Beijing, P.R. China;Fujitsu R & D Center, Beijing, P.R. China

  • Venue:
  • DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Text occlusion is among the most intractable obstacles for OCR engines. A typical example in document images is visible watermark characters, which are often occluded by foreground contents. This paper proposes a solution by restoring watermark characters before recognition. The text restoration process consists a core module as patch-based restoration method, which reconstructs the missing areas by referring to similar patches from undamaged areas. The filling sequence is in a order based on the structure complexity inside each patch, which helps to suppress reconstruction error propagation. Furthermore, the patch size is adaptively selected based on the local character stroke width. Experiments show that the proposed method produces good restoration quality and effectively improves the recognition rate of the following OCR process. Furthermore, the algorithm is optimized based on statistical analysis model and the processing time meets the real-time responding requirement.