A Simulation Study of the CRAY X-MP Memory System
IEEE Transactions on Computers
Designing efficient algorithms for parallel computers
Designing efficient algorithms for parallel computers
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
IEEE Transactions on Computers
Digital neural networks
New approximation algorithms for graph coloring
Journal of the ACM (JACM)
IEEE Transactions on Parallel and Distributed Systems
Artificial Neural Networks: A Tutorial
Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
Minimization of Memory and Network Contention for Accessing Arbitrary Data Patterns in SIMD Systems
IEEE Transactions on Computers
Exploiting dual data-memory banks in digital signal processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Journal of Algorithms
A Heuristic Storage for Minimizing Access Time of Arbitrary Data Patterns
IEEE Transactions on Parallel and Distributed Systems
Memory systems and pipelined processors
Memory systems and pipelined processors
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Computer and Robot Vision
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories
IEEE Transactions on Parallel and Distributed Systems
Multiskewing-A Novel Technique for Optimal Parallel Memory Access
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Exploiting compile time knowledge to improve memory bandwidth can produce noticeable improvements at runtime.(1, 2) Allocating the data structure(1) to separate memories whenever the data may be accessed in parallel allows improvements in memory access time of 13 to 40%. We are concerned with synthesizing compiler storage schemes for minimizing array access conflicts in parallel memories for a set of compiler predicted data access patterns. The access patterns can be easily found for many synchronous dataflow computations like multimedia compression/decompression algorithms, DSP, vision, robotics, etc. A storage scheme is a mapping from array addresses into storages. Finding a conflict-free storage scheme for a set of data patterns is NP-complete. This problem is reduceable to weighted graph coloring. Optimizing the storage scheme is investigated by using constructive heuristics, neural methods, and genetic algorithms. The details of implementation of these different approaches are presented. Using realistic data patterns, simulation shows that memory utilization of 80% or higher can be achieved in the case of 20 data patterns over up to 256 parallel memories, i.e., a scalable parallel memory. The neural approach was relatively very fast in producing reasonably good solutions even in the case of large problem sizes. Convergence of proposed neural algorithm seems to be only slightly dependent on problem size. Genetic algorithms are recommended for advanced compiler optimization especially for large problem sizes; and applications which are compiled once and run many times over different data sets. The solutions presented are also useful for other optimization problems.