Visual similarity based document layout analysis

  • Authors:
  • Di Wen;Xiao-Qing Ding

  • Affiliations:
  • Department of Electronic Engineering & State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, P.R. China;Department of Electronic Engineering & State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, P.R. China

  • Venue:
  • Journal of Computer Science and Technology - Special section on China AVS standard
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.