Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
High performance multidimensional analysis and data mining
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High Performance OLAP and Data Mining on Parallel Computers
Data Mining and Knowledge Discovery
Data-Driven Discovery of Quantitative Rules in Relational Databases
IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Organization of Large Multidimensional Arrays
Proceedings of the Tenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Discovery of Multiple-Level Association Rules from Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Parallel data intensive computing in scientific and commercial applications
Parallel Computing - Parallel data-intensive algorithms and applications
Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors
Distributed and Parallel Databases
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP
Distributed and Parallel Databases
PnP: sequential, external memory, and parallel iceberg cube computation
Distributed and Parallel Databases
A web-based ERP data mining system for decision making
International Journal of Computer Applications in Technology
Towards an Algebraic foundation for business planning
Proceedings of the 2009 EDBT/ICDT Workshops
International Journal of Intelligent Information and Database Systems
Data guided approach to generate multi-dimensional schema for targeted knowledge discovery
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
Decision support systems are important in leveraging information present in data warehouses in businesses like banking, insurance, retail and health-care among many others. The multi-dimensional aspects of a business can be naturally expressed using a multi-dimensional data model. Data analysis and data mining on these warehouses pose new challenges for traditional database systems. OLAP and data mining operations require summary information on these multi-dimensional data sets. Query processing for these applications require different views of data for analysis and effective decision making. Data mining techniques can be applied in conjunction with OLAP for an integrated business solution. As data warehouses grow, parallel processing techniques have been applied to enable the use of larger data sets and reduce the time for analysis, thereby enabling evaluation of many more options for decision making.In this paper we address (1) scalability in multi-dimensional systems for OLAP and multi-dimensional analysis, (2) integration of data mining with the OLAP framework, and (3) high performance by using parallel processing for OLAP and data mining. We describe our system PARSIMONY - Parallel and Scalable Infrastructure for Multidimensional Online analytical processing. This platform is used both for OLAP and data mining. Sparsity of data sets is handled by using sparse chunks using a bit-encoded sparse structure for compression, which enables aggregate operations on compressed data. Techniques for effectively using summary information available in data cubes for data mining are presented for mining Association rules and decision-tree based Classification. These take advantage of the data organization provided by the multidimensional data model.Performance results for high dimensional data sets on a distributed memory parallel machine (IBM SP-2) show good speedup and scalability.