Intelligent Data Granulation on Load: Improving Infobright's Knowledge Grid

Authors:
Dominik Ślęzak;Marcin Kowalski
Affiliations:
Institute of Mathematics, University of Warsaw, Warsaw, Poland 02-097;Infobright Inc., Poland, Warsaw, Poland 02-078
Venue:
FGIT '09 Proceedings of the 1st International Conference on Future Generation Information Technology
Year:
2009

Citing 18
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
The knowledge grid

Communications of the ACM
Handbook of data mining and knowledge discovery

Handbook of data mining and knowledge discovery
Incremental Clustering and Dynamic Information Retrieval

SIAM Journal on Computing
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient query processing for multi-dimensionally clustered tables in DB2

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Configuration-parametric query optimization for physical design tuning

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Handbook of Granular Computing

Handbook of Granular Computing
Brighthouse: an analytic data warehouse for ad-hoc queries

Proceedings of the VLDB Endowment
Architecture of a Database System

Foundations and Trends in Databases
The Database Architecture Jigsaw Puzzle

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Data warehouse technology by infobright

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Semantic knowledge integration to support inductive query optimization

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Injecting domain knowledge into a granular database engine: a position paper

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards approximate SQL: infobright's approach

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the major aspects of Infobright's relational database technology is automatic decomposition of each of data tables onto Rough Rows , each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows' values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].