Associating labels and elements of deep web query interface based on DOM
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Hi-index | 0.00 |
For integrating web databases, the very first challenge is to understand what a query interface says or what query capabilities a source supports. From the view of people, the interior structure of web pages is not concerned to for people. In the most cases, semantic block is identified via visual elements. Therefore, in this paper, a novel arithmetic of schema extraction based on visual features of pages has been designed to grasp and analyze attributes and query controls of pages. Firstly, judge query interface region by heuristic rules; Then, parse the interface region by analytic algorithm of pages; Lastly, deal with the query interface region to get logical attributes by visual features of pages, which are shown by a link list. Experiment result shows that this method has dramatically improved the extraction precision of query schema.