Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Machine Learning
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The amount of information contained in databases in the Web has grown explosively in the last years. This information, known as the Deep Web, is dynamically obtained from specific queries to these databases through Web Query Interfaces (WQIs). The problem of finding and accessing databases in the Web is a great challenge due to the Web sites are very dynamic and the information existing is heterogeneous. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in databases in the Web. Since WQIs are the only means to access databases in the Web, the automatic identification of WQIs plays an important role facilitating traditional search engines to increase the coverage and access interesting information not available on the indexable Web. In this paper we present a strategy for automatic identification of WQIs using supervised learning and making an adequate selection and extraction of HTML elements in the WQIs to form the training set. We present two experimental tests over a corpora of HTML forms considering positive and negative examples. Our proposed strategy achieves better accuracy than previous works reported in the literature.