On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
Hardware support for large atomic units in dynamically scheduled machines
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
IEEE Transactions on Computers
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Distributed storage control unit for the Hitachi S-3800 multivector supercomputer
ICS '94 Proceedings of the 8th international conference on Supercomputing
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
High-bandwidth address translation for multiple-issue processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data caches for superscalar processors
ICS '97 Proceedings of the 11th international conference on Supercomputing
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Evaluation of high performance multicache parallel texture mapping
ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Adding a vector unit to a superscalar processor
ICS '99 Proceedings of the 13th international conference on Supercomputing
Instruction fetch mechanisms for multipath execution processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Access region locality for high-bandwidth processor memory system design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A cost effective architecture for vectorizable numerical and multimedia applications
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A High-Bandwidth Memory Pipeline for Wide Issue Processors
IEEE Transactions on Computers
Execution history guided instruction prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Execution History Guided Instruction Prefetching
The Journal of Supercomputing
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
The TM3270 Media-Processor Data Cache
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Exploiting the replication cache to improve performance for multiple-issue microprocessors
ACM SIGARCH Computer Architecture News - Special issue: MEDEA 2004 workshop
Exploiting the replication cache to improve cache read bandwidth cost effectively
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Investigating cache energy and latency break-even points in high performance processors
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
I-cache multi-banking and vertical interleaving
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Investigating cache energy and latency break-even points in high performance processors
ACM SIGARCH Computer Architecture News
Parallel Memory Architecture for Application-Specific Instruction-Set Processors
Journal of Signal Processing Systems
Access region cache with register guided memory reference partitioning
Journal of Systems Architecture: the EUROMICRO Journal
Parallel memory architecture for TTA processor
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
The bandwidth expansion effectiveness of cache levels block prefetch
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors
Journal of Signal Processing Systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Dynamic partition of memory reference instructions – a register guided approach
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hot-and-Cold: using criticality in the design of energy-efficient caches
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
APC: a performance metric of memory systems
ACM SIGMETRICS Performance Evaluation Review
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Highly aggressive multi-issue processor designs of the past few years and projections for the decade, require that we redesign the operation of the cache memory system. The number of instructions that must be processed (including incorrectly predicted ones) will approach 16 or more per cycle. Since memory operations account for about a third of all instructions executed, these systems will have to support multiple data references per cycle. In this paper, we explore reference stream characteristics to determine how best to meet the need for ever increasing access rates. We identify limitations of existing multi-ported cache designs and propose a new structure, the Locally-Based Interleaved Cache (LBIC), to exploit the characteristics of the data reference stream while approaching the economy of traditional multi-bank cache design. Experimental results show that the LBIC structure is capable of outper forming current multi-ported approaches.