Parallelizing a Defect Detection and Categorization Application

Authors:
Leonid Glimcher;Gagan Agrawal;Sameep Mehta;Ruoming Jin;Raghu Machiraju
Affiliations:
Ohio State University, Columbus OH;Ohio State University, Columbus OH;Ohio State University, Columbus OH;Ohio State University, Columbus OH;Ohio State University, Columbus OH
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Year:
2005

Citing 9
Cited 1

Parallelizing Image Feature Extraction on Coarse-Grain Machines

IEEE Transactions on Pattern Analysis and Machine Intelligence
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Shared Memory Parallelization of Decision Tree Construction Using a General Data Mining Middleware

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Classification for Data Mining on Shared-Memory Multiprocessors

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Detection and Visualization of Anomalous Structures in Molecular Dynamics Simulation Data

VIS '04 Proceedings of the conference on Visualization '04
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering

A Vision for Cyberinfrastructure for Coastal Forecasting and Change Analysis

GeoSensor Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a case study in creating a parallel and scalable implementation of a scientific data analysis application.We focus on a defect detection and categorization application which analyzes datasets produced by Molecular Dynamics (MD) simulations. In parallelizing this application, we had the following three goals.First, we obviously wanted to achieve high parallel efficiency.Second, we wanted to create an implementation that can scale to disk-resident datasets. Third, we wanted to create an easy to maintain and modify implementation, which is possible only through using high-level interfaces.We used a number of techniques for organizing the input data, achieving load balance, and efficiently parallelizing the step for updating and matching with the defect catalog.To meet our third goal, we used a system called FREERIDE (FRamework for Rapid Implementation of Datamining Engines), which was originally developed for parallelizing data mining algorithms. We have carried out a detailed evaluation of our implementation. The main observations from our experiments are as follows: 1) our implementation achieves high parallel efficiency, 2) the execution time remains proportional to the amount of computation even as the dataset becomes disk-resident, and 3) our scheme for load balancing and the method we use for parallelizing updating and matching of the defect catalog are crucial for parallel efficiency of the defect categorization phase.