K-graphs: selecting top-k data sources for XML keyword queries

  • Authors:
  • Khanh Nguyen;Jinli Cao

  • Affiliations:
  • Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia;Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia

  • Venue:
  • DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of existing approaches on XML keyword search focus on querying over a single data source. However, searching over hundreds or even thousands of (distributed) data sources by sequentially querying every single data source is extremely costly, thus it can be impractical. In this paper, we propose an approach for selecting top-k data sources to a given query in order to avoid the high cost of searching numerous, potentially irrelevant data sources. The proposed approach can efficiently select top-k mostly relevant data sources without querying over the data sources. We propose a ranking function for measuring the strength of correlation between keywords in a data source and summarize the data sources as keywords correlation graphs (K-Graphs). The top-k relevant data sources will be selected by estimating the relevance of corresponding K-Graphs to the query. Experimental results show that the approach achieves good performance with a variety of experimental parameters.