Performance of various computers using standard linear equations software in a FORTRAN environment
ACM SIGARCH Computer Architecture News
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Analyzing and visualizing performance of memory hierarchies
Parallel computer systems
Data cache performance of supercomputer applications
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Blocking Linear Algebra Codes for Memory Hierarchies
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An efficient architecture for loop based data preloading
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ICS '93 Proceedings of the 7th international conference on Supercomputing
A distributed shared memory multiprocessor ASURA: memory and cache architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Data and program restructuring of irregular applications for cache-coherent multiprocessor
ICS '94 Proceedings of the 8th international conference on Supercomputing
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch
IBM Journal of Research and Development
Reducing cache conflicts in data cache prefetching
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
A study of integrated prefetching and caching strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching
ICS '95 Proceedings of the 9th international conference on Supercomputing
Scalar processor of the VPP500 parallel supercomputer
ICS '95 Proceedings of the 9th international conference on Supercomputing
A proposal of self-cleanup cache
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Limits on the performance benefits of multithreading and prefetching
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ACM Transactions on Computer Systems (TOCS)
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The intrinsic bandwidth requirements of ordinary programs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data prefetching and multilevel blocking for linear algebra operations
ICS '96 Proceedings of the 10th international conference on Supercomputing
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A microarchitectural performance evaluation of a 3.2 Gbyte/s microprocessor bus
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
The interaction of software prefetching with ILP processors in shared-memory systems
Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
A comparison of data prefetching on an access decoupled and superscalar machine
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Cache sensitive modulo scheduling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Profetching and memory system behavior of the SPEC95 benchmark suite
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
Load execution latency reduction
ICS '98 Proceedings of the 12th international conference on Supercomputing
ICS '98 Proceedings of the 12th international conference on Supercomputing
Hardware-driven prefetching for pointer data references
ICS '98 Proceedings of the 12th international conference on Supercomputing
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations
IEEE Transactions on Computers
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
Effects of Multithreading on Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A General Interprocedural Framework for Placement of Split-Phase Large Latency Operations
IEEE Transactions on Parallel and Distributed Systems
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
International Journal of Parallel Programming
Empirical investigation of the Markov reference model
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Active Management of Data Caches by Exploiting Reuse Information
IEEE Transactions on Computers
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Reducing the impact of software prefetching on register pressure
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Matrix multiplication: a case study of enhanced data cache utilization
Journal of Experimental Algorithmics (JEA)
ACM Computing Surveys (CSUR)
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Access pattern based local memory customization for low power embedded systems
Proceedings of the conference on Design, automation and test in Europe
Source-to-Source Instrumentation for the Optimization of an Automatic Reading System
The Journal of Supercomputing
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
ICS '01 Proceedings of the 15th international conference on Supercomputing
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Page replacement using marginal loss functions
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Software caching vs. prefetching
Proceedings of the 3rd international symposium on Memory management
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Computation regrouping: restructuring programs for temporal data cache locality
ICS '02 Proceedings of the 16th international conference on Supercomputing
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Simple and effective array prefetching in Java
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Sunder: a programmable hardware prefetch architecture for numerical loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
MIST: an algorithm for memory miss traffic management
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Optimizing a Superscalar Machine to Run Vector Code
IEEE Parallel & Distributed Technology: Systems & Technology
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Increasing hardware data prefetching performance using the second-level cache
Journal of Systems Architecture: the EUROMICRO Journal
Stride-directed Prefetching for Secondary Caches
ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Runtime Association of Software Prefetch Control to Memory Access Instructions (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
View management in multimedia databases
The VLDB Journal — The International Journal on Very Large Data Bases
Indirect VLIW memory allocation for the ManArray multiprocessor DSP
ACM SIGARCH Computer Architecture News
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Prefetching by Self-Contained Variables - a Generalization from Array to Recursive Data Structures
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
When prefetching improves/degrades performance
Proceedings of the 2nd conference on Computing frontiers
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Memory Performance Optimizations For Real-Time Software HDTV Decoding
Journal of VLSI Signal Processing Systems
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Cache-conscious frequent pattern mining on a modern processor
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Hardware Support for Bulk Data Movement in Server Platforms
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
Pattern-driven prefetching for multimedia applications on embedded processors
Journal of Systems Architecture: the EUROMICRO Journal
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Cache-conscious frequent pattern mining on modern and emerging processors
The VLDB Journal — The International Journal on Very Large Data Bases
Proceedings of the 20th annual international conference on Supercomputing
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Optimal multistream sequential prefetching in a shared cache
ACM Transactions on Storage (TOS)
Optimization of frequent itemset mining on multiple-core processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Latency-tolerant software pipelining in a production compiler
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hiding cache miss penalty using priority-based execution for embedded processors
Proceedings of the conference on Design, automation and test in Europe
Power-Aware Software Prefetching
ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
A load-instruction unit for pipelined processors
IBM Journal of Research and Development
Cache line reservation: exploring a scheme for cache-friendly object allocation
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
COMPASS: a programmable data prefetcher using idle GPU shaders
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dependence-based code generation for a CELL processor
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Using prefetching to improve reference-counting garbage collectors
CC'07 Proceedings of the 16th international conference on Compiler construction
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
Proceedings of the 24th ACM International Conference on Supercomputing
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Introducing the semi-stencil algorithm
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Virtual I/O caching: dynamic storage cache management for concurrent workloads
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
The practice of i/o optimizations for out-of-core computation
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
Improving the performance of GCC by exploiting IA-64 architectural features
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Cache-Conscious collision resolution in string hash tables
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
New memory organizations for 3d DRAM and PCMs
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Implications of electronics technology trends to algorithm design
VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
A multi-core memory organization for 3-d DRAM as main memory
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Diagnosis and optimization of application prefetching performance
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.01 |