Compiler and middleware support for scalable data mining

  • Authors:
  • Gagan Agrawal;Ruoming Jin;Xiaogang Li

  • Affiliations:
  • Department of Computer and Information Sciences, University of Delaware, Newark, DE;Department of Computer and Information Sciences, University of Delaware, Newark, DE;Department of Computer and Information Sciences, University of Delaware, Newark, DE

  • Venue:
  • LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract. The parallelizing compiler community has traditionally focused its efforts on scientific applications. This paper gives an overview of a compiler/runtime project targeting parallel and scalable execution of data mining algorithms. To the best of our knowledge, this is the first project with such a focus. Data mining is the process of analyzing large datasets for extracting novel and useful patterns or models. Though a lot of effort has been put into developing parallel algorithms for data mining tasks, the expertise and effort currently required in implementing, maintaining, and performance tuning a parallel data mining application is an impediment in the wide use of parallel computers for data mining. We have developed a data parallel dialect of Java that can be used for expressing common data mining algorithms at a high level. Our compiler generates a middleware specification from this dialect of Java. The middleware supports both distributed memory and shared memory parallelization, and performs a number of I/O optimizations to support efficient processing of disk resident datasets. Our final goal is to start from declarative mining operators, and translate them to data parallel Java. In this paper, we describe the commonality among different data mining algorithms, the middleware and its interface, the data parallel dialect of Java, and the compilation techniques required for generating the middleware specification. Experimental evaluations of the middleware and the compiler are also presented.