Learning query languages of Web interfaces

Authors:
André Bergholz;Boris Chidlovskii
Affiliations:
Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France;Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 14
Cited 0

Learning to Understand Information on the Internet: AnExample-Based Approach

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Computing capabilities of mediators

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Boolean Query Mapping Across Heterogeneous Information Sources

IEEE Transactions on Knowledge and Data Engineering
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Approximate Query Translation Across Heterogeneous Information Sources

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Are Web Services the Next Revolution in e-Commerce? (Panel)

Proceedings of the 27th International Conference on Very Large Data Bases
Automatic Repairing of Web Wrappers by Combining Redundant Views

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of automatic acquisition of the query languages supported by a Web information resource. We describe a system that automatically probes the search interface of a resource with a set of test queries and analyses the returned pages to recognize supported query operators. The automatic acquisition assumes the availability of the number of matches the resource returns for a submitted query. The match numbers are used to train a learning system and to generate classification rules that recognize the query operators supported by a provider and their syntactic encodings. These classification rules are employed during the automatic probing of new providers to determine query operators they support. We report on results of experiments with a set of real Web resources.