Learning to Understand Information on the Internet: AnExample-Based Approach
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Computing capabilities of mediators
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Boolean Query Mapping Across Heterogeneous Information Sources
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Machine Learning
Approximate Query Translation Across Heterogeneous Information Sources
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Are Web Services the Next Revolution in e-Commerce? (Panel)
Proceedings of the 27th International Conference on Very Large Data Bases
Automatic Repairing of Web Wrappers by Combining Redundant Views
ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
This paper studies the problem of automatic acquisition of the query languages supported by a Web information resource. We describe a system that automatically probes the search interface of a resource with a set of test queries and analyses the returned pages to recognize supported query operators. The automatic acquisition assumes the availability of the number of matches the resource returns for a submitted query. The match numbers are used to train a learning system and to generate classification rules that recognize the query operators supported by a provider and their syntactic encodings. These classification rules are employed during the automatic probing of new providers to determine query operators they support. We report on results of experiments with a set of real Web resources.