Automatic identification of web query interfaces

  • Authors:
  • Heidy M. Marin-Castro;Victor J. Sosa-Sosa;Ivan Lopez-Arevalo

  • Affiliations:
  • Center of Research and Advanced Studies of the National Polytechnic Institute, Information Technology Laboratory Scientific and Technological Park of Tamaulipas TECNOTAM, Mexico;Center of Research and Advanced Studies of the National Polytechnic Institute, Information Technology Laboratory Scientific and Technological Park of Tamaulipas TECNOTAM, Mexico;Center of Research and Advanced Studies of the National Polytechnic Institute, Information Technology Laboratory Scientific and Technological Park of Tamaulipas TECNOTAM, Mexico

  • Venue:
  • MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The amount of information contained in databases in the Web has grown explosively in the last years. This information, known as the Deep Web, is dynamically obtained from specific queries to these databases through Web Query Interfaces (WQIs). The problem of finding and accessing databases in the Web is a great challenge due to the Web sites are very dynamic and the information existing is heterogeneous. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in databases in the Web. Since WQIs are the only means to access databases in the Web, the automatic identification of WQIs plays an important role facilitating traditional search engines to increase the coverage and access interesting information not available on the indexable Web. In this paper we present a strategy for automatic identification of WQIs using supervised learning and making an adequate selection and extraction of HTML elements in the WQIs to form the training set. We present two experimental tests over a corpora of HTML forms considering positive and negative examples. Our proposed strategy achieves better accuracy than previous works reported in the literature.