K-relevance: a spectrum of relevance for data sources impacting a query

  • Authors:
  • Jiansheng Huang;Jeffrey F. Naughton

  • Affiliations:
  • University of Wisconsin at Madison, Madison, WI;University of Wisconsin at Madison, Madison, WI

  • Venue:
  • Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Applications ranging from grid management to sensor nets to web-based information integration and extraction can be viewed as receiving data from some number of autonomous remote data sources and then answering queries over this collected data. In such environments it is helpful to inform users which data sources are "relevant" to their query results. It is not immediately obvious what "relevant" should mean in this context, as different users will have different requirements. In this paper, rather than proposing a single definition of relevance, we propose a spectrum of definitions, which we term "k-relevance", for k ≥ 0. We give algorithms for identifying k-relevant data sources for relational queries and explore their efficiency both analytically and experimentally. Finally, we explore the impact of integrity constraints (including dependencies) and materialized views on the problem of computing and maintaining relevant data sources.