COMBI-operator - database support for data mining applications

  • Authors:
  • Alexander Hinneburg;Dirk Habich;Wolfgang Lehner

  • Affiliations:
  • Martin-Luther-University of Halle;Martin-Luther-University of Halle;Dresden University of Technology

  • Venue:
  • VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database support for data mining has become an important research topic. Especially for large high-dimensional data volumes, comprehensive support from the database side is necessary. In this paper we identify the data intensive subproblem of aggregating high-dimensional data in all possible low-dimensional projections (for instance estimating low-dimensional histograms), which occurs in several established data mining techniques. Second, we show that existing OLAP SQL-extensions are insufficient for high-dimensional data and propose a new SQL-operator, which seamlessly fits into the set of existing OLAP Group By operators. Third, we propose efficient implementations for the operator, which take the limited resources of main memory into account. We demonstrate on a number of real and synthetic data sets that for the identified subproblem our new implementations yield a large speedup (up to factor 10) over existing methods built in commercially available database systems.