Increasing memory bandwidth for vector computations
Proceedings of the international conference on Programming languages and system architectures
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
The energy efficiency of IRAM architectures
Proceedings of the 24th annual international symposium on Computer architecture
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 1st international workshop on Software and performance
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Efficient management of memory hierarchies in embedded DRAM systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
The processor-memory bottleneck: problems and solutions
Crossroads - Computer architecture
Data Locality Exploitation in the Decomposition of Regular Domain Problems
IEEE Transactions on Parallel and Distributed Systems
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
ICS '01 Proceedings of the 15th international conference on Supercomputing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Avoiding initialization misses to the heap
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Early cancellation: an active NIC optimization for time-warp
Proceedings of the sixteenth workshop on Parallel and distributed simulation
Two techniques for reconciling algorithm parallelism with memory constraints
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Scalable parallel coset enumeration: bulk definition and the memory wall
Journal of Symbolic Computation - Computer algebra: Selected papers from ISSAC 2001
IEEE Micro
Deep-Submicron Microprocessor Design Issues
IEEE Micro
A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Architectural Support for Data-intensive Applications
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The New DRAM Interfaces: SDRAM, RDRAM and Variants
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Technology Trends and Adaptive Computing
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
HAGAR: Efficient Multi-context Graph Processors
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A proposal for a new hardware cache monitoring architecture
Proceedings of the 2002 workshop on Memory system performance
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Hierarchical processors-and-memory architecture for high performance computing
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems
IEEE Transactions on Computers
Reflections on the memory wall
Proceedings of the 1st conference on Computing frontiers
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
Profile guided code positioning
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Proceedings of the 18th annual international conference on Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Fast, predictable and low energy memory references through architecture-aware compilation
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Bandwidth Management with a Reconfigurable Data Cache
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
An Address Dependence Model of Computation for Hierarchical Memories with Pipelined Transfer
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
Enhancing NIC Performance for MPI using Processing-in-Memory
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Evaluating kilo-instruction multiprocessors
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Statistical geometry representation for efficient transmission and rendering
ACM Transactions on Graphics (TOG)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Overcoming the memory wall in packet processing: hammers or ladders?
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
The TM3270 Media-Processor Data Cache
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Predicting the Performance of a 3D Processor-Memory Chip Stack
IEEE Design & Test
Queue Usage and Memory-Level Parallelism Sensitive Scheduling
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Dynamic memory instruction bypassing
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
The bit-reversal SDRAM address mapping
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
IBM Journal of Research and Development
A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy
Proceedings of the 43rd annual Design Automation Conference
Memory bandwidth optimization through stream descriptors
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Energy-efficient instruction scheduling utilizing cache miss information
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools
Proceedings of the 20th annual international conference on Supercomputing
IEEE Transactions on Computers
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
SMP-SoC is the answer if you ask the right questions
SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Cache oblivious algorithms for nonserial polyadic programming
The Journal of Supercomputing
Design and implementation of power-aware virtual memory
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Optimizing software cache performance of packet processing applications
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Mapping streaming architectures on reconfigurable platforms
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Random-Accessible Compressed Triangle Meshes
IEEE Transactions on Visualization and Computer Graphics
Frame shared memory: line-rate networking on commodity hardware
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Configuration and extension of embedded processors to optimize IPSec protocol execution
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Computers and Operations Research
Stochastic rollout and justification to solve the resource-constrained project scheduling problem
Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
International Journal of High Performance Computing and Networking
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Memory performance attacks: denial of memory service in multi-core systems
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Exploiting program cyclic behavior to reduce memory latency in embedded processors
Proceedings of the 2008 ACM symposium on Applied computing
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs
Proceedings of the 5th conference on Computing frontiers
Exploring power reduction options for a single-chip multiprocessor through system-level modeling
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
HMTT: a platform independent full-system memory trace monitoring system
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Utilizing shared data in chip multiprocessors with the Nahalal architecture
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
PipesFS: fast Linux I/O in the unix tradition
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Proceedings of the conference on Design, automation and test in Europe
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Interprocedural Speculative Optimization of Memory Accesses to Global Variables
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Journal of Parallel and Distributed Computing
Exploiting loop-dependent stream reuse for stream processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Leveraging on-chip networks for data cache migration in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient implementation of decoupling capacitors in 3D processor-dram integrated computing systems
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Analysis of challenges for on-chip optical interconnects
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Euro-Par 2008 Workshops - Parallel Processing
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
Two memory allocators that use hints to improve locality
Proceedings of the 2009 international symposium on Memory management
On approximating the ideal random access machine by physical machines
Journal of the ACM (JACM)
Multi-target C++ implementation of parallel skeletons
Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
Exploiting Locality on the Cell/B.E. through Bypassing
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
SSE Implementation of Multivariate PKCs on Modern x86 CPUs
CHES '09 Proceedings of the 11th International Workshop on Cryptographic Hardware and Embedded Systems
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Predictable performance for unpredictable workloads
Proceedings of the VLDB Endowment
Reevaluating Amdahl's law in the multicore era
Journal of Parallel and Distributed Computing
LIRAC: using live range information to optimize memory access
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Direct coherence: bringing together performance and scalability in shared-memory multiprocessors
HiPC'07 Proceedings of the 14th international conference on High performance computing
Exploiting execution locality with a decoupled Kilo-instruction processor
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Scaling power/ground solvers on multi-core with memory bandwidth awareness
Proceedings of the 20th symposium on Great lakes symposium on VLSI
On-chip COMA cache-coherence protocol for microgrids of microthreaded cores
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
Fast multiplication of large permutations for disk, flash memory and RAM
Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation
Exploiting the reuse supplied by loop-dependent stream references for stream processors
ACM Transactions on Architecture and Code Optimization (TACO)
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel search on video cards
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Proceedings of the Conference on Design, Automation and Test in Europe
A robust multigrid solver on parallel computers
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
An experimental study of optimizing bioinformatics applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SigMatch: fast and scalable multi-pattern matching
Proceedings of the VLDB Endowment
Streaming Data Movement for Real-Time Image Analysis
Journal of Signal Processing Systems
Exploitation of multicore systems in a java virtual machine
IBM Journal of Research and Development
Patterns for cache optimizations on multi-processor machines
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Application-Tailored I/O with Streamline
ACM Transactions on Computer Systems (TOCS)
Should we worry about memory loss?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Tackling cache-line stealing effects using run-time adaptation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E
International Journal of High Performance Computing Applications
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Pinned to the walls: impact of packaging and application properties on the memory and power walls
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches
ACM Transactions on Architecture and Code Optimization (TACO)
A parallel code for time independent quantum reactive scattering on CPU-GPU platforms
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
CHES'11 Proceedings of the 13th international conference on Cryptographic hardware and embedded systems
Toward five-dimensional scaling: how density improves efficiency in future computers
IBM Journal of Research and Development
Energy-Efficient Scheduling on Milliclusters with Performance Constraints
GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
Memory access cycle and the measurement of memory systems
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Scientific computing applications on the imagine stream processor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Processor directed dynamic page policy
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
The challenges of efficient code-generation for massively parallel architectures
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Compile-Time thread distinguishment algorithm on VIM-Based architecture
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Collecting and exploiting cache-reuse metrics
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
A memory bandwidth effective cache store miss policy
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Software–hardware cooperative power management for main memory
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compilation and simulation tool chain for memory aware energy optimizations
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Code-based cache partitioning for improving hardware cache performance
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
WMTools - assessing parallel application memory utilisation at scale
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
KISS-Tree: smart latch-free in-memory indexing on modern architectures
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
CCDR-PAID: more efficient cache-conscious PAID algorithm by data reconstruction
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Journal of Computational Physics
Mat-core: a decoupled matrix core extension for general-purpose processors
Neural, Parallel & Scientific Computations
Partitioning and multi-core parallelization of multi-equation forecast models
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Making data prefetch smarter: adaptive prefetching on POWER7
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Shared hardware data structures for hard real-time systems
Proceedings of the tenth ACM international conference on Embedded software
A distributed interleaving scheme for efficient access to WideIO DRAM memory
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
APC: a performance metric of memory systems
ACM SIGMETRICS Performance Evaluation Review
Toward on-chip datacenters: a perspective on general trends and on-chip particulars
The Journal of Supercomputing
NUMA-aware graph mining techniques for performance and energy efficiency
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Interactive visualization for memory reference traces
EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
FPGA programming for the masses
Communications of the ACM
Low power cache architectures with hybrid approach of filtering unnecessary way accesses
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Application task and data placement in embedded many-core NUMA architectures
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
FPGA Programming for the Masses
Queue - Mobile Web Development
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches
Proceedings of the 40th Annual International Symposium on Computer Architecture
Return data interleaving for multi-channel embedded CMPs systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Memory-centric system interconnect design with hybrid memory cubes
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Toward application-specific memory reconfiguration for energy efficiency
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Reviewing traffic classification
DataTraffic Monitoring and Analysis
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management
Microprocessors & Microsystems
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap
ACM Transactions on Architecture and Code Optimization (TACO)
Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems
Scientific Programming
Creating robust high-throughput traffic sign detectors using centre-surround HOG statistics
Machine Vision and Applications
Hi-index | 0.03 |