Fast Loads and Fast Queries

  • Authors:
  • Goetz Graefe

  • Affiliations:
  • Hewlett-Packard Laboratories,

  • Venue:
  • DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

For efficient query processing, a relational table should be indexed in multiple ways; for efficient database loading, indexes should be omitted. Moerkotte's "small materialized aggregates" can be used to alleviate this tension, notably in the form of Netezza's "zone maps." Their most significant advantageous characteristics are that (i) load bandwidth is maximized by avoiding the cost of index maintenance, (ii) there is no need for complex index tuning, and (iii) scans for typical queries are very fast. Their most significant limiting characteristics are that (iv) they are effective only for query predicates on columns correlated with the load sequence, (v) individual outlier values can sharply reduce their effectiveness, and (vi) they fail to improve search performance within a zone. In this research, we introduce zone filters and zone indexes that address these limitations without reducing the advantages. The new data structures can be created as side effects of the load process, with all required analyses accomplished while a moderate amount of new data still remains in the buffer pool. Traditional sorting and indexing are not required. Nonetheless, query performance matches that of zxone maps where those apply, exceeds it for predicates for which zone maps are ineffective, and can be comparable to query processing in a database with traditional indexing, as demonstrated in our simulations.