The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A safe approximate algorithm for interprocedural aliasing
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Interprocedural may-alias analysis for pointers: beyond k-limiting
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Efficient context-sensitive pointer analysis for C programs
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Cell Processor Architecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
IEEE Transactions on Computers
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
On the Effects of Memory Latency and Bandwidth on Supercomputer Application Performance
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Local Memory Design Space Exploration for High-Performance Computing
The Computer Journal
Hi-index | 0.00 |
Cache coherence protocols limit the scalability of chip multiprocessors. One solution is to introduce a local memory alongside the cache hierarchy, forming a hybrid memory system. Local memories are more power-efficient than caches and they do not generate coherence traffic but they suffer from poor programmability. When non-predictable memory access patterns are found compilers do not succeed in generating code because of the incoherency between the two storages. This paper proposes a coherence protocol for hybrid memory systems that allows the compiler to generate code even in the presence of memory aliasing problems. Coherency is ensured by a simple software/hardware co-design where the compiler identifies potentially incoherent memory accesses and the hardware diverts them to the correct copy of the data. The coherence protocol introduces overheads of 0.24% in execution time and of 1.06% in energy consumption to enable the usage of the hybrid memory system.