Scalable parallel data mining for association rules
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallel data mining for association rules on shared-memory multi-processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Characterization and parallelization of decision-tree induction
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Emerging scientific applications in data mining
Communications of the ACM - Evolving data mining into solutions for insights
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
Strategies for Parallel Data Mining
IEEE Concurrency
Scalable Parallel Clustering for Data Mining on Multicomputers
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Parallelism in Knowledge Discovery Techniques
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Parallel k/h-Means Clustering for Large Data Sets
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Performance and Memory-Access Characterization of Data Mining Applications
WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Proceedings of the 30th annual international symposium on Computer architecture
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
A characterization of data mining algorithms on a modern processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Understanding the applicability of CMP performance optimizations on data mining applications
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Hi-index | 0.00 |
Explosive growth in the availability of various kinds of data in both commercial and scientific domains have resulted in an unprecedented need to develop novel data-driven, knowledge discovery techniques. Data mining is one such data-centric application. It consists of methods to discover interesting, nontrivial, and useful patterns hidden within massive amounts of data. Researchers from both academia and industry have recognized that the challenges of data mining applications will help shape the future of multi-core processor and parallelizing compiler designs. However, relatively little has been done to understand the performance characteristics of these applications on modern multi-core processors. The exponential growth of on-chip resources make it critical to exploit parallelism at all granularities for improving the performance of data mining applications. In this paper, we examine the instruction-level, memory-level and thread-level parallelism available in data mining applications. We observe that (i) data mining applications have a slightly different instruction mix from SPEC integer applications, and this difference can potentially lead to different ILP extraction; ii) although many data mining applications suffer from data cache miss penalty, similar to SPEC integer applications, different techniques must be developed to enable effective prefetching due to the existance of complex and irregular data structures, such as hash tables; (iii) although data mining applications have large amount of thread-level parallelism, efficient extraction of such parallelism depends on on-chip cache performance; and (iv) the performance characteristics of data mining applications can vary at runtime, and thus techniques that dynamically tune the applications to adapt to such variations are desired.