A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Prefetching Using Markov Predictors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Reevaluating Online Superpage Promotion with Hardware Support
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Practical, transparent operating system support for superpages
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
General purpose operating system support for multiple page sizes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
XAMM: a high-performance automatic memory management system with memory-constrained designs
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
The working set size of Java applications on embedded systems has recently been increasing, causing the Translation Lookaside Buffer (TLB) to become a serious performance bottleneck. From a thorough analysis of the SPECjvm98 benchmark suite executing on a commodity embedded system, we find TLB misses attribute from 24% to 50% of the total execution time. We explore and evaluate a wide spectrum of TLB-enhancing techniques with different combinations of software/hardware approaches, namely superpage for reducing TLB miss rates, two-level TLB and TLB prefetching for reducing both TLB miss rates and TLB miss latency, and even a no-TLB design for removing TLB overhead completely. We adapt and then in a novel way extend these approaches to fit the design space of embedded systems executing Java code. We compare these approaches, discussing their performance behavior, software/hardware complexity and constraints, especially the design implications for the application, runtime and OS.We first conclude that even with the aggressive approaches presented, there remains a performance bottleneck with the TLB. Second, in addition to facing very different design considerations and constraints for embedded systems, proven hardware techniques, such as TLB prefetching have different performance implications. Third, software based solutions, no-TLB design and superpaging, appear to be more effective in improving Java application performance on embedded systems. Finally, beyond performance, these approaches have their respective pros and cons; it is left to the system designer to make the appropriate engineering tradeoff.