A Distributed Framework for Parallel Data Mining Using HPJava

Authors:
O. Rana;D. Fisk
Affiliations:
-;-
Venue:
BT Technology Journal
Year:
1999

Citing 13
Cited 1

Using and designing massively parallel computers for artificial neural networks

Journal of Parallel and Distributed Computing - Special issue on neural computing on massively parallel processing
C4.5: programs for machine learning

C4.5: programs for machine learning
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Using neural networks for data mining

Future Generation Computer Systems - Special double issue on data mining
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Java Foundation Classes

Java Foundation Classes
Programming and Deploying Java Mobile Agents Aglets

Programming and Deploying Java Mobile Agents Aglets
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Knowledge Discovery in Databases and Data Mining

ISMIS '96 Proceedings of the 9th International Symposium on Foundations of Intelligent Systems
Language Bindings for a Data-Parallel Runtime

HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
On-Line Analytical Processing in Distributed Data Warehouses

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Using Multivariate Statistics (5th Edition)

Using Multivariate Statistics (5th Edition)
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

An evaluation of sampling methods for data mining with fuzzy C-means

Data mining for design and manufacturing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Java has become a language of choice for applications executing in heterogeneous environments utilising distributed objects and multithreading. To handle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algorithms. Conventional Java implementations do not directly provide support for the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parallelism, and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discussed for overcoming limitations in the Java Virtual Machine. The framework supports parallelism across workstation clusters, using the message-passing interface as middleware, and can support different analysis algorithms, wrapped as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.