Intelligent Data Granulation on Load: Improving Infobright's Knowledge Grid

  • Authors:
  • Dominik Ślęzak;Marcin Kowalski

  • Affiliations:
  • Institute of Mathematics, University of Warsaw, Warsaw, Poland 02-097;Infobright Inc., Poland, Warsaw, Poland 02-078

  • Venue:
  • FGIT '09 Proceedings of the 1st International Conference on Future Generation Information Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the major aspects of Infobright's relational database technology is automatic decomposition of each of data tables onto Rough Rows , each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows' values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].