IEEE article data extraction from internet
INES'09 Proceedings of the IEEE 13th international conference on Intelligent Engineering Systems
Digging the wild web: an interactive tool for web data consolidation
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Hi-index | 0.00 |
A data extraction model, named the browser-oriented data extraction (BODE) model, was proposed in [14] to extract web contents with script functions. In this model, the system built on top of browsers accesses pages by simulating usersý operations on browsers. Based on this model, this paper defines a scripting language, named the BODED (Browser-Oriented Data Extraction Description) language, which instructs the system how to do data extraction. This paper proposes a technique, called indirect browser replication to implement a BODE system, and also optimize the performance of this technique.