Detecting and Partitioning Data Objects in Complex Web Pages

  • Authors:
  • Shiren Ye;Tat-Seng Chua

  • Affiliations:
  • National University of Singapore;National University of Singapore

  • Venue:
  • WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an automated approach to detect and partition data objects or product description from complex Web pages. First, we derive the common page structure by comparing similar pages, and then identify data region covering the descriptions of data objects. Second, we partition the nodes belonging to different data objects in the data region and construct the self-explainable XML output files. The experiments indicate that our technique is effective.