Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The POSTGRES next generation database management system
Communications of the ACM
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiple-block ahead branch predictors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Alternative fetch and issue policies for the trace cache fetch mechanism
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Dynamic history-length fitting: a third level of adaptivity for branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
The Effect of Code Expanding Optimizations on Instruction Cache Design
IEEE Transactions on Computers
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effect of Program Optimization on Trace Cache Efficiency
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Optimization of Instruction Fetch for Decision Support Workloads
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
In this paper we address the important problem of instruction fetch for future wide issue superscalar processors. Our approach focuses on understanding the interaction between software and hardware techniques targeting an increase in the instruction fetch bandwidth. That is the objective, for instance, of the Hardware Trace Cache (HTC). We design a profile based code reordering technique which targets a maximization of the sequentiality of instructions, while still trying to minimize instruction cache misses. We call our software approach, Software Trace Cache (STC). We evaluate our software approach, and then compare it with the HTC and the combination of both techniques. Our results on PostgreSQL show that for large codes with few loops and deterministic execution sequences the STC offers better results than a HTC. Also, both the software and hardware approaches combine well to obtain improved results.