Learning query languages of Web interfaces

  • Authors:
  • André Bergholz;Boris Chidlovskii

  • Affiliations:
  • Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France;Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France

  • Venue:
  • Proceedings of the 2004 ACM symposium on Applied computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the problem of automatic acquisition of the query languages supported by a Web information resource. We describe a system that automatically probes the search interface of a resource with a set of test queries and analyses the returned pages to recognize supported query operators. The automatic acquisition assumes the availability of the number of matches the resource returns for a submitted query. The match numbers are used to train a learning system and to generate classification rules that recognize the query operators supported by a provider and their syntactic encodings. These classification rules are employed during the automatic probing of new providers to determine query operators they support. We report on results of experiments with a set of real Web resources.