A benchmark of NonStop SQL on the debit credit transaction
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems
Communications of the ACM
Parallel database systems: open problems and new issues
Distributed and Parallel Databases - Special issue: Research topics in distributed and parallel databases
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
ParFiSys: a parallel file system for MPP
ACM SIGOPS Operating Systems Review
A trace-driven comparison of algorithms for parallel prefetching and caching
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A database perspective on knowledge discovery
Communications of the ACM
Large-scale file systems with the flexibility of databases
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Multidimensional array I/O in Panda 1.0
The Journal of Supercomputing
Scalable parallel data mining for association rules
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The Galley parallel file system
Parallel Computing - Special double issue: parallel I/O
Data surveyor: the nuggets in parallel
Advances in knowledge discovery and data mining
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
A case for intelligent disks (IDISKs)
ACM SIGMOD Record
Principles of distributed database systems (2nd ed.)
Principles of distributed database systems (2nd ed.)
Automatic I/O hint generation through speculative execution
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Parallelism in relational data base systems: architectural issues and design approaches
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Intensive Data Management in Parallel Systems: A Survey
Distributed and Parallel Databases
A fast distributed algorithm for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Exploiting global input/output access pattern classification
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
Advanced Database Machine Architecture
Advanced Database Machine Architecture
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Strategies for Parallel Data Mining
IEEE Concurrency
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Designing a Kernel for Data Mining
IEEE Expert: Intelligent Systems and Their Applications
Database Mining: A Performance Perspective
IEEE Transactions on Knowledge and Data Engineering
Parallel Out-of-Core Divide-and-Conquer Techniques with Application to Classification Trees
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Active Storage for Large-Scale Data Mining and Multimedia
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach
PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Parallel Sequence Mining on Shared-Memory Machines
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
ViPIOS: The Vienna Parallel Input/Output System
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Large-Scale Parallel Data Clustering
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Informed prefetching and caching
Informed prefetching and caching
High-performance data mining with skeleton-based structured parallel programming
Parallel Computing - Parallel data-intensive algorithms and applications
Towards a Parallel Data Mining Toolbox
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Processing frequent itemset discovery queries by division and set containment join operators
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Lessons and Challenges from Mining Retail E-Commerce Data
Machine Learning
The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery
International Journal of High Performance Computing Applications
Compiler and middleware support for scalable data mining
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Adapting the weka data mining toolkit to a grid based environment
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Hi-index | 0.00 |
The current generation of data mining tools have limited capacity and performance, since these tools tend to be sequential. This paper explores a migration path out of this bottleneck by considering an in tegrated hardware and softw are approach to parallelize data mining. Our analysis shows that parallel data mining solutions require the following components: parallel data mining algorithms, parallel and distributed data bases, parallel file systems, parallel I/O, tertiary storage, management of online data, support for heterogeneous data representations, security, quality of service and pricing metrics. State of the art technology in these areas is surveyed with an eye towards an integration strategy leading to a complete solution.