COMBI-operator - database support for data mining applications

Authors:
Alexander Hinneburg;Dirk Habich;Wolfgang Lehner
Affiliations:
Martin-Luther-University of Halle;Martin-Luther-University of Halle;Dresden University of Technology
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 23
Cited 5

Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQLEM: fast clustering in SQL using the EM algorithm

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SQL database primitives for decision tree classifiers

Proceedings of the tenth international conference on Information and knowledge management
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
HD-Eye: visual clustering of high dimensional data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Integrating Data Mining with SQL Databases: OLE DB for Data Mining

Proceedings of the 17th International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
User-Defined Table Operators: Enhancing Extensibility for ORDBMS

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Analyzing Quantitative Databases: Image is Everything

Proceedings of the 27th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient Evaluation of Queries with Mining Predicates

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Horizontal aggregations for building tabular data sets

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient computation of multiple group by queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Embedded predictive modeling in a parallel relational database

Proceedings of the 2006 ACM symposium on Applied computing
Building statistical models and scoring with UDFs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database support for data mining has become an important research topic. Especially for large high-dimensional data volumes, comprehensive support from the database side is necessary. In this paper we identify the data intensive subproblem of aggregating high-dimensional data in all possible low-dimensional projections (for instance estimating low-dimensional histograms), which occurs in several established data mining techniques. Second, we show that existing OLAP SQL-extensions are insufficient for high-dimensional data and propose a new SQL-operator, which seamlessly fits into the set of existing OLAP Group By operators. Third, we propose efficient implementations for the operator, which take the limited resources of main memory into account. We demonstrate on a number of real and synthetic data sets that for the identified subproblem our new implementations yield a large speedup (up to factor 10) over existing methods built in commercially available database systems.