Associating labels and elements of deep web query interface based on DOM

  • Authors:
  • Baohua Qiang;Long Shi;Chunming Wu;Qian He;Chao Shen

  • Affiliations:
  • School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin, P.R. China,College of Computer and Information Science, Southwest University, Chongqing, P.R. China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin, P.R. China;College of Computer and Information Science, Southwest University, Chongqing, P.R. China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin, P.R. China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin, P.R. China

  • Venue:
  • WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query interface schema extraction is an important issue for Deep Web data acquisition and integration. In order to obtain the query interface schema, it is firstly required to associate elements and labels of Deep Web query interface correctly. Due to the fact that query interface on HTML page can be parsed as well structured DOM, we proposed an effective algorithm for associating elements and labels of Deep Web query interface based on hierarchical DOM. Our algorithm mainly adopted the nearest-neighbor-distance and other two useful heuristic rules to associate the most related label of a given control element. The experimental results on real query interfaces show that our proposed algorithm is highly effective.