A General Approach for Partitioning Web Page Content Based on Geometric and Style Information

  • Authors:
  • H. Guo;J. Mahmud;Y. Borodin;A. Stent;I. Ramakrishnan

  • Affiliations:
  • Stony Brook University, NY;Stony Brook University, NY;Stony Brook University, NY;Stony Brook University, NY;Stony Brook University, NY

  • Venue:
  • ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our ap- proach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over pre- sentation style information to determine presentation style similarity. We present several examples to illustrate the gen- erality of our approach.