Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Analytic Queueing Network Models for Parallel Processing of Task Systems
IEEE Transactions on Computers
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Analyzing the behavior and performance of parallel programs
Analyzing the behavior and performance of parallel programs
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Communications of the ACM - Voting systems
Mapping computational concepts to GPUs
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
Performance of Synchronized Iterative Processes in Multiprocessor Systems
IEEE Transactions on Software Engineering
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Amdahl's Law in the Multicore Era
Computer
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Auto-tuning performance on multicore computers
Auto-tuning performance on multicore computers
Evaluating multi-core platforms for HPC data-intensive kernels
Proceedings of the 6th ACM conference on Computing frontiers
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
Journal of Parallel and Distributed Computing
Performance tuning and analysis of future vector processors based on the roofline model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
SCAMPI: a scalable CAM-based algorithm for multiple pattern inspection
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
High-throughput Bayesian network learning using heterogeneous multicore computers
Proceedings of the 24th ACM International Conference on Supercomputing
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
ACM SIGBED Review - Special Issue on the Work-in-Progress (WIP) Session at the 2009 IEEE Real-Time Systems Symposium (RTSS)
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A case for machine learning to optimize multicore performance
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Design principles for end-to-end multicore schedulers
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
International Journal of High Performance Computing Applications
ACM SIGARCH Computer Architecture News
Verification of printer datapaths using timed automata
ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part II
Performance engineering: a must for petascale and beyond
Proceedings of the third international workshop on Large-scale system and application performance
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
What Hill-Marty model learn from and break through Amdahl's law?
Information Processing Letters
Performance evaluations of gyrokinetic Eulerian code GT5D on massively parallel multi-core platforms
State of the Practice Reports
Performance modeling for systematic performance tuning
State of the Practice Reports
World-highest resolution global atmospheric model and its performance on the Earth Simulator
State of the Practice Reports
Tiled QR factorization algorithms
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SIAM Journal on Scientific Computing
ACM SIGARCH Computer Architecture News
GPU and APU computations of Finite Time Lyapunov Exponent fields
Journal of Computational Physics
A performance analysis framework for identifying potential benefits in GPGPU applications
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Portable parallel performance from sequential, productive, embedded domain-specific languages
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the 9th conference on Computing Frontiers
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
A polyphase filter for GPUs and multi-core processors
Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
Journal of Computational Physics
Parallelization of EULAG model on multicore architectures with GPU accelerators
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Power-aware multi-core simulation for early design stage hardware/software co-optimization
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
High throughput software for direct numerical simulations of compressible two-phase flows
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Aspen: a domain specific language for performance modeling
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Algorithmic species: A classification of affine loop nests for parallel programming
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
GPURoofline: a model for guiding performance optimizations on GPUs
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
An insightful program performance tuning chain for GPU computing
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
How much (execution) time and energy does my algorithm cost?
XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing
Performance and toolchain of a combined GPU/FPGA desktop (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Diagnosis and optimization of application prefetching performance
Proceedings of the 27th international ACM conference on International conference on supercomputing
Performance analysis and prediction for distributed homogeneous clusters
Computer Science - Research and Development
Future of GPGPU micro-architectural parameters
Proceedings of the Conference on Design, Automation and Test in Europe
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
International Journal of High Performance Computing Applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for hybrid parallel flow simulations with a trillion cells in complex geometries
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using automated performance modeling to find scalability bugs in complex codes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
11 PFLOP/s simulations of cloud cavitation collapse
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Assessing the performance of OpenMP programs on the intel xeon phi
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
On the GPU performance of cell-centered finite volume method over unstructured tetrahedral meshes
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Towards making autotuning mainstream
International Journal of High Performance Computing Applications
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Optimizing convolution operations on GPUs using adaptive tiling
Future Generation Computer Systems
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
Proceedings of Workshop on General Purpose Processing Using GPUs
Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools
International Journal of Reconfigurable Computing
Hi-index | 0.02 |
The Roofline model offers insight on how to improve the performance of software and hardware.