Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Authors:
Liheng Jian;Cheng Wang;Ying Liu;Shenshen Liang;Weidong Yi;Yong Shi
Affiliations:
School of Information Science and Engineering, Graduate University of Chinese Academy of Sciences, Beijing, China;Agilent Technologies Co. Ltd., Beijing, China;School of Information Science and Engineering, Graduate University of Chinese Academy of Sciences, Beijing, China and Research Center on Fictitious Economy and Data Science, Chinese Academy of Sci ...;School of Information Science and Engineering, Graduate University of Chinese Academy of Sciences, Beijing, China;School of Information Science and Engineering, Graduate University of Chinese Academy of Sciences, Beijing, China;Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, China and University of Nebraska at Omaha, Omaha, USA
Venue:
The Journal of Supercomputing
Year:
2013

Citing 21
Cited 4

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Formulations of Decision-Tree Classification Algorithms

Data Mining and Knowledge Discovery
Effect of Data Distribution in Parallel Mining of Associations

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Introduction To Business Data Mining

Introduction To Business Data Mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Parallel mining of association rules from text databases

The Journal of Supercomputing
Interactive Visualization of Volumetric White Matter Connectivity in DT-MRI Using a Parallel-Hardware Hamilton-Jacobi Solver

IEEE Transactions on Visualization and Computer Graphics
Fast support vector machine training and classification on graphics processors

Proceedings of the 25th international conference on Machine learning
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
Gnort: High Performance Network Intrusion Detection Using Graphics Processors

RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Clustering billions of data points using GPUs

Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Least squares quantization in PCM

IEEE Transactions on Information Theory
The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment

The Journal of Supercomputing

Parallel evaluation of Pittsburgh rule-based classifiers on GPUs

Neurocomputing
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

The Journal of Supercomputing
Technical Section: EXOD: A tool for building and exploring a large graph of open datasets

Computers and Graphics
High performance evaluation of evolutionary-mined association rules on GPUs

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high performance computing for general-purpose applications. Compute Unified Device Architecture (CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. Data mining is widely used and has significant applications in various domains. However, current data mining toolkits cannot meet the requirement of applications with large-scale databases in terms of speed. In this paper, we propose three techniques to speedup fundamental problems in data mining algorithms on the CUDA platform: scalable thread scheduling scheme for irregular pattern, parallel distributed top-k scheme, and parallel high dimension reduction scheme. They play a key role in our CUDA-based implementation of three representative data mining algorithms, CU-Apriori, CU-KNN, and CU-K-means. These parallel implementations outperform the other state-of-the-art implementations significantly on a HP xw8600 workstation with a Tesla C1060 GPU and a Core-quad Intel Xeon CPU. Our results have shown that GPU + CUDA parallel architecture is feasible and promising for data mining applications.