A segmentation method for web page analysis using shrinking and dividing

  • Authors:
  • Jiuxin Cao;Bo Mao;Junzhou Luo

  • Affiliations:
  • Sch. of Comp. Sci. and Eng. and Jiangsu Provincial Key Lab. of Network and Info. Sec., Southeast Univ., Nanjing, China and Key Lab. of Comp. Network and Info. Integration, Min. of Ed., Southeast U ...;Sch. of Comp. Sci. and Eng. and Jiangsu Provincial Key Lab. of Network and Info. Sec., Southeast Univ., Nanjing, China and Key Lab. of Comp. Network and Info. Integration, Min. of Ed., Southeast U ...;Sch. of Comp. Sci. and Eng. and Jiangsu Provincial Key Lab. of Network and Info. Sec., Southeast Univ., Nanjing, China and Key Lab. of Comp. Network and Info. Integration, Min. of Ed., Southeast U ...

  • Venue:
  • International Journal of Parallel, Emergent and Distributed Systems - Network and parallel computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

On the basis of image processing technology and characteristics of web pages, a new web segmentation method-iterated shrinking and dividing is proposed in this paper. Dividing conditions and concept of dividing zone are introduced, based on which web page image is divided into visually consentaneous sub-images by shrinking and splitting iteratively. First, the web page is saved as image that is preprocessed by edge detection algorithm such as Canny. Then dividing zones are detected and the web image is segmented repeatedly until all blocks are indivisible. This method can be used to analyse the web pages such as detecting similar visual layout. Experiments show that the algorithm is suitable for web page segmentation, and does well in expansibility and performance.