Querying complex structured databases

  • Authors:
  • Cong Yu;H. V. Jagadish

  • Affiliations:
  • University of Michigan;University of Michigan

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Correctly generating a structured query (e.g., an XQuery or a SQL query) requires the user to have a full understanding of the database schema, which can be a daunting task. Alternative query models have been proposed to give users the ability to query the database without schema knowledge. Those models, including simple keyword search and labeled keyword search, aim to extract meaningful data fragments that match the structure-free query conditions (e.g., keywords) based on various matching semantics. Typically, the matching semantics are content-based: they are defined on data node inter-relationships and incur significant query evaluation cost. Our first contribution is a novel matching semantics based on analyzing the database schema. We show that query models employing a schema-based matching semantics can reduce query evaluation cost significantly while maintaining or even improving result quality. The adoption of schema-based matching semantics does not change the nature of those query models: they are still schema-ignorant, i.e., users express no schema knowledge (except the labels in labeled keyword search) in the query. While those models work well for some queries on some databases, they often encounter problems when applied to complex queries on databases with complex schemas. Our second contribution is a novel query model that incorporates partial schema knowledge through the use of schema summary. This new summary-aware query model, called Meaningful Summary Query (MSQ), seamlessly integrates summary-based structural conditions and structure-free conditions, and enables ordinary users to query complex databases. We design algorithms for evaluating MSQ queries, and demonstrate that MSQ queries can produce better results against complex databases when compared with previous approaches, and that they can be efficiently evaluated.