A parallel implementation of the k nearest neighbours classifier in three levels: threads, MPI processes and the grid

Authors:
G. Aparício;I. Blanquer;V. Hernández
Affiliations:
Instituto de las Aplicaciones de las Tecnologías de la Información y Comunicaciones Avanzadas, Universidad Politécnica de Valencia, Valencia, Spain;Instituto de las Aplicaciones de las Tecnologías de la Información y Comunicaciones Avanzadas, Universidad Politécnica de Valencia, Valencia, Spain;Instituto de las Aplicaciones de las Tecnologías de la Información y Comunicaciones Avanzadas, Universidad Politécnica de Valencia, Valencia, Spain
Venue:
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Year:
2006

Citing 2
Cited 1

MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications

Interactive data mining on a CBEA cluster

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The work described in this paper tackles the problem of data mining and classification of large amounts of data using the K nearest neighbours classifier (KNN) [1]. The large computing demand of this process is solved with a parallel computing implementation specially designed to work in Grid environments of multiprocessor computer farms. The different parallel computing approaches (intra-node, inter-node and inter-organisations) are not sufficient by themselves to face the computing demand of such a big problem. Instead of using parallel techniques separately, we propose to combine the three of them considering the parallelism grain of the different parts of the problem. The main purpose is to complete a 1 month-CPU job in a few hours. The technologies that are being used are the EGEE Grid Computing Infrastructure running the Large Hadron Collider Computing Grid (LCG 2.6) middleware [3], MPI [4] [5] and POSIX [6] threads. Finally, we compare the results obtained with the most popular and used tools to understand the importance of this strategy.