PaDDMAS: Parallel and Distributed Data Mining Application Suite

Authors:
Affiliations:
Venue:
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Year:
2000

Citing 0
Cited 6

KNOWLEDGE GRID: High Performance Knowledge Discovery on the Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
A Data Mining Architecture for Distributed Environments

IICS '02 Proceedings of the Second International Workshop on Innovative Internet Computing Systems
A Data Mining Architecture for Clustered Environments

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Parallel Data Mining Experimentation Using Flexible Configurations

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Grid implementation of the Apriori algorithm

Advances in Engineering Software
Developing an open knowledge discovery support system for network environment

CTS'05 Proceedings of the 2005 international conference on Collaborative technologies and systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from analysis engines of results.We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm.