A Web Data Extraction Description Language and Its Implementation

  • Authors:
  • I-Chen Wu;Jui-Yuan Su;Loon-Been Chen

  • Affiliations:
  • National Chiao Tung University;National Chiao Tung University;National Chiao Tung University

  • Venue:
  • COMPSAC '05 Proceedings of the 29th Annual International Computer Software and Applications Conference - Volume 01
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data extraction model, named the browser-oriented data extraction (BODE) model, was proposed in [14] to extract web contents with script functions. In this model, the system built on top of browsers accesses pages by simulating usersý operations on browsers. Based on this model, this paper defines a scripting language, named the BODED (Browser-Oriented Data Extraction Description) language, which instructs the system how to do data extraction. This paper proposes a technique, called indirect browser replication to implement a BODE system, and also optimize the performance of this technique.