The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
C4.5: programs for machine learning
C4.5: programs for machine learning
Foundations of parallel programming
Foundations of parallel programming
Parallel skeletons for structured composition
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast sequential and parallel algorithms for association rule mining: a comparison
Fast sequential and parallel algorithms for association rule mining: a comparison
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Data mining, hypergraph transversals, and machine learning (extended abstract)
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Models and languages for parallel computation
ACM Computing Surveys (CSUR)
SkIE: a heterogeneous environment for HPC applications
Parallel Computing - Special Anniversary issue
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery
PQE2000: HPC Tools for Industrial Applications
IEEE Concurrency
Strategies for Parallel Data Mining
IEEE Concurrency
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Parallel Out-of-Core Divide-and-Conquer Techniques with Application to Classification Trees
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Parallelisation of C4.5 as a Particular Divide and Conquer Computation
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Requirements Analysis for Parallel KDD Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Heterogeneous HPC Environments
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Language Interoperability for High-Performance Parallel Scientific Components
ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
Using Object-Oriented Techniques for Realizing Parallel Architectural Skeletons
ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
Parallel Induction Algorithms for Data Mining
IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
The programming model of ASSIST, an environment for parallel and distributed portable applications
Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The programming model of ASSIST, an environment for parallel and distributed portable applications
Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
A mobile knowledge carrier with personalized knowledge provision
Computers and Industrial Engineering - Special issue: Computational intelligence and information technology applications to industrial engineering selected papers from the 33 rd ICC&IE
A mobile knowledge carrier with personalized knowledge provision
Computers and Industrial Engineering
Parallel fuzzy c-means cluster analysis
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Porting decision tree algorithms to multicore using fastflow
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Parallel implementation of a fuzzy rule based classifier
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable parallel OPTICS data clustering using graph algorithmic techniques
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We show how to apply a structured parallel programming (SPP) methodology based on skeletons to data mining (DM) problems, reporting several results about three commonly used mining techniques, namely association rules, decision tree induction and spatial clustering. We analyze the structural patterns common to these applications, looking at application performance and software engineering efficiency. Our aim is to clearly state what features a SPP environment should have to be useful for parallel DM. Within the skeleton-based PPE SkIE that we have developed, we study the different patterns of data access of parallel implementations of Apriori, C4.5 and DBSCAN. We need to address large partitions reads, frequent and sparse access to small blocks, as well as an irregular mix of small and large transfers, to allow efficient development of applications on huge databases. We examine the addition of an object/component interface to the skeleton structured model, to simplify the development of environment integrated, parallel DM applications.