Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Implementing stack simulation for highly-associative memories
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
A design framework to efficiently explore energy-delay tradeoffs
Proceedings of the ninth international symposium on Hardware/software codesign
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
High level cache simulation for heterogeneous multiprocessors
Proceedings of the 41st annual Design Automation Conference
Analytical Design Space Exploration of Caches for Embedded Systems
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Finding optimal L1 cache configuration for embedded systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Exact and fast L1 cache simulation for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Evaluation techniques for storage hierarchies
IBM Systems Journal
ACCESS: Smart scheduling for asymmetric cache CMPs
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Hi-index | 0.00 |
The configuration of L1 caches has a significant impact on the performance and energy consumption of an embedded system. Normally, an embedded system is designed for a specific application or a domain of applications. Performing simulations on the application(s) is the most popular way to find the optimal L1 cache configuration. However, the simulation-based approach suffers from long simulation time due to the need to exhaustively simulate all configurations, which are characterized by three parameters: the number of cache sets, associativity, and the cache line size. In previous work, the most time-consuming part was to determine the hit or miss status of a cache access under each configuration by performing a linear search on a long linked-list based on the inclusion property. In this work, we propose a novel simulator, HC-Sim, which adopts elaborate data structures, a centralized hash table, and a novel miss counter structure, to effectively reduce the search time. On average, we can achieve 2.56X speedup compared to the existing fastest approach (SuSeSim). In addition, we implement HC-Sim by using the dynamic binary instrumentation tool, Pin. This provides scalability for simulating larger applications by eliminating the overhead of generating and storing a huge trace file. Furthermore, HC-Sim provides the capacity to simulate an L1 cache and a scratchpad memory (SPM) simultaneously. It helps designers to explore the design space considering both L1 cache configurations and the SPM sizes.