ACM Transactions on Computer Systems (TOCS)
A methodology for performance evaluation of parallel applications on multiprocessors
Journal of Parallel and Distributed Computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The influence of random delays on parallel execution times
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analyzing multiprocessor cache behavior through data reference modeling
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallelized Direct Execution Simulation of Message-Passing Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
The impact of I/O on program behavior and parallel scheduling
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Predictive analysis of a wavefront application using LogGP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Efficient performance prediction for modern microprocessors
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An analytical model of the working-set sizes in decision-support systems
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data mining: concepts and techniques
Data mining: concepts and techniques
Parallel data mining for association rules on shared-memory multi-processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Parallel Programming with Polaris
Computer
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Accurate Performance Prediction for Assively Parallel Systems and Its Applications
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
On the Automatic Parallelization of Sparse and Irregular Fortran Programs
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Microbenchmarking and Performance Prediction for Parallel
Microbenchmarking and Performance Prediction for Parallel
Machine Characterization and Benchmark Performance Prediction
Machine Characterization and Benchmark Performance Prediction
Mechanisms for efficient shared-memory, lock-based synchronization
Mechanisms for efficient shared-memory, lock-based synchronization
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Hi-index | 0.00 |
In this paper, we revisit the problem of performance prediction on SMP machines, motivated by the need for selecting parallelization strategy for random write reductions. Such reductions frequently arise in data mining algorithms. In our previous work, we have developed a number of techniques for parallelizing this class of reductions. Our previous work has shown that each of the three techniques, full replication, optimized full locking, and cache-sensitive, can outperform others depending upon problem, dataset, and machine parameters. Therefore, an important question is, ''Can we predict the performance of these techniques for a given problem, dataset, and machine?''. This paper addresses this question by developing an analytical performance model that captures a two-level cache, coherence cache misses, TLB misses, locking overheads, and contention for memory. Analytical model is combined with results from micro-benchmarking to predict performance on real machines. We have validated our model on two different SMP machines. Our results show that our model effectively captures the impact of memory hierarchy (two-level cache and TLB) as well as the factors that limit parallelism (contention for locks, memory contention, and coherence cache misses). The difference between predicted and measured performance is within 20% in almost all cases. Moreover, the model is quite accurate in predicting the relative performance of the three parallelization techniques.