New Generation Computing - Selected papers from the international workshop on algorithmic learning theory,1990
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Data mining: concepts and techniques
Data mining: concepts and techniques
Advances in Distributed and Parallel Knowledge Discovery
Advances in Distributed and Parallel Knowledge Discovery
Communications of the ACM
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Strategies to parallelize ILP systems
ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
Distributed data mining system based on multi-agent communication mechanism
KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II
Hi-index | 0.00 |
A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.