Spatial Relation Based Object Extraction from the World Wide Web

  • Authors:
  • Hao Jingmin;Liao Lejian; HeDi

  • Affiliations:
  • -;-;-

  • Venue:
  • WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The statistical results of observations show that regular spatial distribution characteristics exist for Web information about objects of the same type across different Web sites. The spatial distance between components within one object is always less than that between different objects. A novel method based on spatial configuration of Web document to extract object from the World Wide Web is presented. It demonstrates a fully automatic bottom-up process of object extraction. This method primarily considers the distribution characteristic of Web information and is independent of underlying documentation representation, such as HTML code. Experiments show that the proposed method can work well even when the HTML structure is far different from layout structure, and the results are encouraging.