High Performance OLAP and Data Mining on Parallel Computers

  • Authors:
  • Sanjay Goil;Alok Choudhary

  • Affiliations:
  • Department of Electrical and Computer Engineering and Center for Parallel and Distributed Computing, Northwestern University, Evanston - IL 60201.;Department of Electrical and Computer Engineering and Center for Parallel and Distributed Computing, Northwestern University, Evanston - IL 60201.

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queriesposed on such systems are quite complex and require different views ofdata. Analytical models need to capture the multidimensionality of theunderlying data, a task for which multidimensional databases are wellsuited. Multidimensional OLAP systems store data in multidimensional arrayson which analytical operations are performed. Knowledge discovery and datamining requires complex operations on the underlying data which can be veryexpensive in terms of computation time. High performance parallel systemscan reduce this analysis time.Precomputed aggregate calculations in a Data Cube can provide efficientquery processing for OLAP applications. In this article, we presentalgorithms for construction of data cubes on distributed-memory parallelcomputers. Data is loaded from a relational database into amultidimensional array. We present two methods, sort-based and hash-basedfor loading the base cube and compare their performances. Data cubes areused to perform consolidation queries used in roll-up operations usingdimension hierarchies. Finally, we show how data cubes are used for datamining using Attribute Focusing techniques. We present results for these onthe IBM-SP2 parallel machine. Results show that our algorithms andtechniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications.