Join queries with external text sources: execution and optimization techniques

  • Authors:
  • Surajit Chaudhuri;Umeshwar Dayal;Tak W. Yan

  • Affiliations:
  • Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA;Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA;Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA and Department of Computer Science, Stanford University, Stanford, CA

  • Venue:
  • SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing in a system that loosely integrates an extensible database system and a text retrieval system. We focus on a class of conjunctive queries that include joins between text and structured data, in addition to selections over these two types of data. We adapt techniques from distributed query processing and introduce a novel class of join methods based on probing that is especially useful for joins with text systems, and we present a cost model for the various alternative query processing methods. Experimental results confirm the utility of these methods. The space of query plans is extended due to the additional techniques, and we describe an optimization algorithm for searching this extended space. The techniques we describe in this paper are applicable to other types of external data managers loosely integrated with a database system.