Fast Loads and Fast Queries

Authors:
Goetz Graefe
Affiliations:
Hewlett-Packard Laboratories,
Venue:
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Year:
2009

Citing 11
Cited 2

Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A compact B-tree

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The evolution of effective B-tree: page organization and techniques: a personal account

ACM SIGMOD Record
B-Tree Indexes and CPU Caches

Proceedings of the 17th International Conference on Data Engineering
Efficient Bulk Deletes in Relational Databases

Proceedings of the 17th International Conference on Data Engineering
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Study of Index Structures for Main Memory Database Management Systems

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Search of Multi-Dimensional B-Trees

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
B-tree indexes, interpolation search, and skew

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Organization and maintenance of large ordered indices

SIGFIDET '70 Proceedings of the 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control

Time-HOBI: indexing dimension hierarchies by means of hierarchically organized bitmaps

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Time-HOBI: Index for optimizing star queries

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

For efficient query processing, a relational table should be indexed in multiple ways; for efficient database loading, indexes should be omitted. Moerkotte's "small materialized aggregates" can be used to alleviate this tension, notably in the form of Netezza's "zone maps." Their most significant advantageous characteristics are that (i) load bandwidth is maximized by avoiding the cost of index maintenance, (ii) there is no need for complex index tuning, and (iii) scans for typical queries are very fast. Their most significant limiting characteristics are that (iv) they are effective only for query predicates on columns correlated with the load sequence, (v) individual outlier values can sharply reduce their effectiveness, and (vi) they fail to improve search performance within a zone. In this research, we introduce zone filters and zone indexes that address these limitations without reducing the advantages. The new data structures can be created as side effects of the load process, with all required analyses accomplished while a moderate amount of new data still remains in the buffer pool. Traditional sorting and indexing are not required. Nonetheless, query performance matches that of zxone maps where those apply, exceeds it for predicates for which zone maps are ineffective, and can be comparable to query processing in a database with traditional indexing, as demonstrated in our simulations.