Distributed generative data mining

Authors:
Ruy Ramos;Rui Camacho
Affiliations:
LIACC, Porto, Portugal and FEUP, Porto, Portugal;LIACC, Porto, Portugal and FEUP, Porto, Portugal
Venue:
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Year:
2007

Citing 9
Cited 1

Inductive logic programming

New Generation Computing - Selected papers from the international workshop on algorithmic learning theory,1990
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Data mining: concepts and techniques

Data mining: concepts and techniques
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
The knowledge grid

Communications of the ACM
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Strategies to parallelize ILP systems

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming

Distributed data mining system based on multi-agent communication mechanism

KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.