VarDB: high-performance warehouse processing with massive ordering and binary search

  • Authors:
  • Pedro Martins;João Costa;José Cecílio;Pedro Furtado

  • Affiliations:
  • University of Coimbra, Coimbra Portugal;University of Coimbra, Coimbra Portugal;University of Coimbra, Coimbra Portugal;University of Coimbra, Coimbra Portugal

  • Venue:
  • DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current data base management systems (DBMS) compete aggressively for performance. In order to accomplish that, they are adopting new storage schemas, developing better compression algorithms, using faster hardware, optimizing parallel and distributed data processing. Current row-wise systems do not exploit massive ordering redundancy, and current columnwise approaches exploit only partially. An important current research issue concerns replacing optimization and processing complexity by less complex but ultra fast solutions. We propose the varDB approach to optimize performance over data warehouses. The solution minimizes complex operators, by applying a simple scheme and organizing all structures and processing to that end: massive ordering with efficient sorting and log2N searching. Considering data warehouses, with periodic loads and frequent analysis operations, such an approach provides very fast query processing. In our work we show how it is possible to use this massive data ordering/sorting in order to optimize queries for high speed, even without the use of data compression (therefore also avoiding compression/decompression overheads). We dedicate our attention to sort columns of data and correlating them with other replicated and unsorted columns. For querying, we focus on binary-search and the use of mainly offsets. Our tests of loading data, sorting vs. creating indexes and executing very selective operations like data filtering and joining show, using a simple disk based prototype, that we are able to obtain much better performance comparing with optimized row-wise engines, and also improvements when comparing with column-wise optimized engines. Comparing to those we were able to attain at least similar performance for many queries and much better performance for queries with complex joins.