A Parallel Scalable Infrastructure for OLAP and Data Mining

Authors:
Sanjay Goil;Alok Choudhary
Affiliations:
-;-
Venue:
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Year:
1999

Citing 11
Cited 11

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
High performance multidimensional analysis and data mining

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High Performance OLAP and Data Mining on Parallel Computers

Data Mining and Knowledge Discovery
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

Parallelizing the Data Cube

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Parallel data intensive computing in scientific and commercial applications

Parallel Computing - Parallel data-intensive algorithms and applications
Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Parallelizing the Data Cube

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Distributed and Parallel Databases
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Distributed and Parallel Databases
PnP: sequential, external memory, and parallel iceberg cube computation

Distributed and Parallel Databases
A web-based ERP data mining system for decision making

International Journal of Computer Applications in Technology
Towards an Algebraic foundation for business planning

Proceedings of the 2009 EDBT/ICDT Workshops
Granule mining oriented data warehousing model for representations of multidimensional association rules

International Journal of Intelligent Information and Database Systems
Data guided approach to generate multi-dimensional schema for targeted knowledge discovery

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision support systems are important in leveraging information present in data warehouses in businesses like banking, insurance, retail and health-care among many others. The multi-dimensional aspects of a business can be naturally expressed using a multi-dimensional data model. Data analysis and data mining on these warehouses pose new challenges for traditional database systems. OLAP and data mining operations require summary information on these multi-dimensional data sets. Query processing for these applications require different views of data for analysis and effective decision making. Data mining techniques can be applied in conjunction with OLAP for an integrated business solution. As data warehouses grow, parallel processing techniques have been applied to enable the use of larger data sets and reduce the time for analysis, thereby enabling evaluation of many more options for decision making.In this paper we address (1) scalability in multi-dimensional systems for OLAP and multi-dimensional analysis, (2) integration of data mining with the OLAP framework, and (3) high performance by using parallel processing for OLAP and data mining. We describe our system PARSIMONY - Parallel and Scalable Infrastructure for Multidimensional Online analytical processing. This platform is used both for OLAP and data mining. Sparsity of data sets is handled by using sparse chunks using a bit-encoded sparse structure for compression, which enables aggregate operations on compressed data. Techniques for effectively using summary information available in data cubes for data mining are presented for mining Association rules and decision-tree based Classification. These take advantage of the data organization provided by the multidimensional data model.Performance results for high dimensional data sets on a distributed memory parallel machine (IBM SP-2) show good speedup and scalability.