Skew detection for complex document images using robust borderlines in both text and non-text regions

  • Authors:
  • Hong Liu;Qi Wu;Hongbin Zha;Xueping Liu

  • Affiliations:
  • National Laboratory on Machine Perception, Peking University, Beijing 100871, China and Shenzhen Graduate School, Peking University, Beijing 100871, China;National Laboratory on Machine Perception, Peking University, Beijing 100871, China;National Laboratory on Machine Perception, Peking University, Beijing 100871, China;Ricoh Co., Japan

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

A new skew detection method for complex document images based on robust borderlines extracted from both text and non-text regions is proposed in this paper. First, borderlines are extracted from the borders of large connected components in a document image by using a run length based method. Second, after filtering out non-linear borderlines, a fast iteration algorithm is applied to optimize each linear borderline's directional angle. Finally, the weighted median value of all the directional angles is calculated as the skew angle of the whole document. Experiments on 2000 various skew document images are implemented. Total correct rate is 95.2%, and the detecting time on average is less than 0.2s for each document. The proposed skew detection method is efficient for complex documents with horizontal and vertical text layout, three kinds of linguistic characters in English, Japanese and Chinese, especially for documents with predominant non-text regions or sparse text regions.