A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Dynamic data distribution with control flow analysis
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Static and Dynamic Locality Optimizations Using Integer Linear Programming
IEEE Transactions on Parallel and Distributed Systems
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Linear analysis and optimization of stream programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Simulation of cloud dynamics on graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Media Processing Applications on the Imagine Stream Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
A programming system for the imagine media processor
A programming system for the imagine media processor
Programmable Stream Processors
Computer
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Analysis and Performance Results of a Molecular Modeling Application on Merrimac
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Cache aware optimization of stream programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Memory hierarchy design for stream computing
Memory hierarchy design for stream computing
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
Merrimac: high-performance and highly-efficient scientific computing with streams
Merrimac: high-performance and highly-efficient scientific computing with streams
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
Scientific computing applications on the imagine stream processor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Architecture-based optimization for mapping scientific applications to imagine
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
The Journal of Supercomputing
I/O streaming evaluation of batch queries for data-intensive computational turbulence
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
Hi-index | 0.00 |
FT64 is the first 64-bit stream processor designed for scientific computing. It is critical to exploit optimizing streamization approaches for scientific applications on FT64 due to the inefficiency of direct streamization approach. In this paper, we propose a novel matrix-based streamization approach for improving locality and parallelism of scientific applications on FT64. First, a Data&Computation Matrix is built to abstract the relationship between loops and arrays of the original programs, and it is helpful for formulating the streamization problem. Second, three key techniques for optimizing streamization approach are proposed based on the transformations of the matrix, i.e., coarse-grained program transformations, fine-grained program transformations, and stream organization optimizations. Finally, we apply our approach to ten typical scientific application kernels on FT64. The experimental results show that the matrix-based streamization approach achieves an average speedup of 2.76 over the direct streamization approach, and performs equally to or better than the corresponding Fortran programs on Itanium 2 except CG. It is certain that the matrix-based streamization approach is a promising and practical solution to efficiently exploit the tremendous potential of FT64.