Parallel programming with MPI
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Parallel programming in OpenMP
Parallel programming in OpenMP
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Efficient memory simulation in SimICS
SS '95 Proceedings of the 28th Annual Simulation Symposium
Using complete machine simulation to understand computer system behavior
Using complete machine simulation to understand computer system behavior
IATAC: a smart predictor to turn-off L2 cache lines
ACM Transactions on Architecture and Code Optimization (TACO)
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast configurable-cache tuning with a unified second-level cache
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A non-uniform cache architecture for low power system design
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Static cache partitioning robustness analysis for embedded on-chip multi-processors
Proceedings of the 3rd conference on Computing frontiers
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Dynamically reconfigurable cache architecture using adaptive block allocation policy
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
With the trends of microprocessor design towards multicore, cache performance becomes more important because an off-chip access would be increasingly expensive due to the competition across the processor cores. A question arises: How to design the cache architecture to prevent a performance bottleneck caused by data accesses? This work studies a reconfigurable cache architecture that can be dynamically configured for meeting the individual demand of running applications. Using a self-developed cache simulator, we first examined how different cache organization and configuration influence the parallel execution of OpenMP applications. The experimental results show that applications benefit from a flexible cache with reconfigurability. This motivated us to go a step further and develop a hardware prototype of this novel architecture.