Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
A performance methodology for commercial servers
IBM Journal of Research and Development
High-Performance Throughput Computing
IEEE Micro
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Queue - Multiprocessors
Queue - Multiprocessors
Queue - Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Exploring the cache design space for large scale CMPs
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Performance implications of single thread migration on a chip multi-core
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Power/performance hardware optimization for synchronization intensive applications in MPSoCs
Proceedings of the conference on Design, automation and test in Europe: Proceedings
The Atomos transactional programming language
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
An efficient synchronization technique for multiprocessor systems on-chip
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A case for chip multiprocessors based on the data-driven multithreading model
International Journal of Parallel Programming
Reducing power through compiler-directed barrier synchronization elimination
Proceedings of the 2006 international symposium on Low power electronics and design
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
FlashCache: a NAND flash memory file cache for low power web servers
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Deconstructing process isolation
Proceedings of the 2006 workshop on Memory system performance and correctness
A flexible data to L2 cache mapping approach for future multicore processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
Proceedings of the 20th annual international conference on Supercomputing
Online power-performance adaptation of multithreaded programs using hardware event-based prediction
Proceedings of the 20th annual international conference on Supercomputing
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
WormTerminator: an effective containment of unknown and polymorphic fast spreading worms
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SMP-SoC is the answer if you ask the right questions
SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Executing Java programs with transactional memory
Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Multi-processor operating system emulation framework with thermal feedback for systems-on-chip
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Transactional collection classes
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating the potential of multithreaded platforms for irregular scientific computations
Proceedings of the 4th international conference on Computing frontiers
Fuce: the continuation-based multithreading processor
Proceedings of the 4th international conference on Computing frontiers
Proceedings of the 4th international conference on Computing frontiers
Singularity: rethinking the software stack
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 34th annual international symposium on Computer architecture
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
A study of thread migration in temperature-constrained multicores
ACM Transactions on Architecture and Code Optimization (TACO)
Temperature aware task scheduling in MPSoCs
Proceedings of the conference on Design, automation and test in Europe
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Thermal-aware scheduling for future chip multiprocessors
EURASIP Journal on Embedded Systems
JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
Data locality enhancement for CMPs
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Improving disk bandwidth-bound applications through main memory compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Characterization of Apache web server with Specweb2005
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
The coming wave of multithreaded chip multiprocessors
International Journal of Parallel Programming
Toward high performance nonblocking software transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Thread scheduling for multi-core platforms
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scaling performance of interior-point method on large-scale chip multiprocessor system
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Proceedings of the 5th conference on Computing frontiers
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The shared-thread multiprocessor
Proceedings of the 22nd annual international conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Operational analysis of processor speed scaling
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Corona: System Implications of Emerging Nanophotonic Technology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
MIRA: A Multi-layered On-Chip Interconnect Router Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Many-core design from a thermal perspective
Proceedings of the 45th annual Design Automation Conference
Microprocessors & Microsystems
Proceedings of the 13th international symposium on Low power electronics and design
Thermal monitoring mechanisms for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Temperature control of high-performance multi-core platforms using convex optimization
Proceedings of the conference on Design, automation and test in Europe
MCjammer: adaptive verification for multi-core designs
Proceedings of the conference on Design, automation and test in Europe
Aspects in hardware: what do they look like?
Proceedings of the 2008 AOSD workshop on Aspects, components, and patterns for infrastructure software
PicoServer: Using 3D stacking technology to build energy efficient servers
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A novel migration-based NUCA design for chip multiprocessors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adaptive Loop Tiling for a Multi-cluster CMP
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Techniques for Efficient Software Checking
Languages and Compilers for Parallel Computing
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Improving support for locality and fine-grain sharing in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adaptive insertion policies for managing shared caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Critical sections: re-emerging scalability concerns for database storage engines
Proceedings of the 4th international workshop on Data management on new hardware
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Controlling chaos: on safe side-effects in data-parallel operations
Proceedings of the 4th workshop on Declarative aspects of multicore programming
A continuation-based noninterruptible multithreading processor architecture
The Journal of Supercomputing
An approach on distributed and shared dynamic cache partition
DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving Search Engines Performance on Multithreading Processors
High Performance Computing for Computational Science - VECPAR 2008
Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
MC-Sim: an efficient simulation tool for MPSoC designs
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
A control theory approach for thermal balancing of MPSoC
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Static and dynamic temperature-aware scheduling for multiprocessor SoCs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Shore-MT: a scalable storage manager for the multicore era
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Implementing high availability memory with a duplication cache
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Notary: Hardware techniques to enhance signatures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Rate-based QoS techniques for cache/memory in CMP platforms
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Service level agreement for multithreaded processors
ACM Transactions on Architecture and Code Optimization (TACO)
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A multigrain Delaunay mesh generation method for multicore SMT-based architectures
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Time-predictable computer architecture
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
A memory system design framework: creating smart memories
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
Scalable directory architecture for distributed shared memory chip multiprocessors
ACM SIGARCH Computer Architecture News
Type-Directed Compilation for Multicore Programming
Electronic Notes in Theoretical Computer Science (ENTCS)
Triplet-based topology for on-chip networks
WSEAS Transactions on Computers
SPARTAN: A software tool for Parallelization Bottleneck Analysis
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
FlexCore: Utilizing Exposed Datapath Control for Efficient Computing
Journal of Signal Processing Systems
Way-tagged cache: an energy-efficient L2 cache architecture under write-through policy
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Journal of Signal Processing Systems
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
A performance evaluation of 2D-mesh, ring, and crossbar interconnects for chip multi-processors
Proceedings of the 2nd International Workshop on Network on Chip Architectures
Reconsidering algorithms for iterative solvers in the multicore era
International Journal of Computational Science and Engineering
Enabling software management for multicore caches with a lightweight hardware support
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Finding representative workloads for computer system design
Finding representative workloads for computer system design
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
TRaX: a multicore hardware architecture for real-time ray tracing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Iterative induced dipoles computation for molecular mechanics on GPUs
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Compiler-based data classification for hybrid caching
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Software—Practice & Experience
Compiler directed network-on-chip reliability enhancement for chip multiprocessors
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
PIRATE: QoS and performance management in CMP architectures
ACM SIGMETRICS Performance Evaluation Review
Online convex optimization-based algorithm for thermal management of MPSoCs
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
An approach to resource-aware co-scheduling for CMPs
Proceedings of the 24th ACM International Conference on Supercomputing
A real-time Java chip-multiprocessor
ACM Transactions on Embedded Computing Systems (TECS)
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing
Proceedings of the 37th annual international symposium on Computer architecture
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
Modeling soft errors for data caches and alleviating their effects on data reliability
Microprocessors & Microsystems
Trace-driven optimization of networks-on-chip configurations
Proceedings of the 47th Design Automation Conference
Proceedings of the 7th International Conference on Frontiers of Information Technology
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Architecture and Code Optimization (TACO)
Understanding throughput-oriented architectures
Communications of the ACM
Introduction to the wire-speed processor and architecture
IBM Journal of Research and Development
VoIP performance on multicore platforms
IBM Journal of Research and Development
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Group-caching for NoC based multicore cache coherent systems
Proceedings of the Conference on Design, Automation and Test in Europe
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Process variation aware thread mapping for chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Online strategies for high-performance power-aware thread execution on emerging multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Enhancing L2 organization for CMPs with a center cell
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Erasing Core Boundaries for Robust and Configurable Performance
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Flow Control for Robust Performance and Energy
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Real-Time Adaptive Background Modeling for Multicore Embedded Systems
Journal of Signal Processing Systems
Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Replacement policies for shared caches on symmetric multicores: a programmer-centric point of view
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Run-time adaptable on-chip thermal triggers
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Low Power Design for a Multi-core Multi-thread Microprocessor
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of execution efficiency in the microthreaded processor UTLEON3
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Memory-, bandwidth-, and power-aware multi-core for a graph database workload
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Emulation-based transient thermal modeling of 2D/3D systems-on-chip with active cooling
Microelectronics Journal
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors
Journal of Parallel and Distributed Computing
Robust adaptation to available parallelism in transactional memory applications
Transactions on high-performance embedded architectures and compilers III
International Journal of Parallel Programming
METE: meeting end-to-end QoS in multicores through system-wide resource management
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
CPPC: correctable parity protected cache
Proceedings of the 38th annual international symposium on Computer architecture
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 38th annual international symposium on Computer architecture
Moguls: a model to explore the memory hierarchy for bandwidth improvements
Proceedings of the 38th annual international symposium on Computer architecture
Mobile processors for energy-efficient web search
ACM Transactions on Computer Systems (TOCS)
METE: meeting end-to-end QoS in multicores through system-wide resource management
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Branch penalty reduction on IBM cell SPUs via software branch hinting
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Processing (multiple) spatio-temporal range queries in multicore settings
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Convex-based thermal management for 3D MPSoCs using DVFS and variable-flow liquid cooling
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
A study of 3D Network-on-Chip design for data parallel H.264 coding
Microprocessors & Microsystems
Loop selection for thread-level speculation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
DAPSCO: Distance-aware partially shared cache organization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Throughput computing with chip multithreading and clusters
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Proceedings of the International Conference on Computer-Aided Design
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient trace-driven metaheuristics for optimization of networks-on-chip configurations
Proceedings of the International Conference on Computer-Aided Design
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
CRUISE: cache replacement and utility-aware scheduling
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Preventing denial-of-service attacks in shared CMP caches
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Tiled multi-core stream architecture
Transactions on High-Performance Embedded Architectures and Compilers IV
Improving performance by reducing aborts in hardware transactional memory
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Can manycores support the memory requirements of scientific applications?
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Machine learning based performance prediction for multi-core simulation
MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
A weighted-fair-queuing (WFQ)-based dynamic request scheduling approach in a multi-core system
Future Generation Computer Systems
High radix self-arbitrating switch fabric with multiple arbitration schemes and quality of service
Proceedings of the 49th Annual Design Automation Conference
Courteous cache sharing: being nice to others in capacity management
Proceedings of the 49th Annual Design Automation Conference
Load balancing for processing spatio-temporal queries in multi-core settings
MobiDE '12 Proceedings of the Eleventh ACM International Workshop on Data Engineering for Wireless and Mobile Access
Practically private: enabling high performance CMPs through compiler-assisted data classification
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
APC: a performance metric of memory systems
ACM SIGMETRICS Performance Evaluation Review
Integration, the VLSI Journal
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Online thermal control methods for multiprocessor systems
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Synchronization mechanisms on modern multi-core architectures
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Accelerated parallel genetic programming tree evaluation with OpenCL
Journal of Parallel and Distributed Computing
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic parallelization of canonical loops
Science of Computer Programming
An energy-efficient L2 cache architecture using way tag information under write-through policy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Hazard driven test generation for SMT processors
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
CATRA- congestion aware trapezoid-based routing algorithm for on-chip networks
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A hybrid HW-SW approach for intermittent error mitigation in streaming-based embedded systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
ICCSA'13 Proceedings of the 13th international conference on Computational Science and Its Applications - Volume 1
P2EST: parallelization philosophies for evaluating spatio-temporal queries
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Market mechanisms for managing datacenters with heterogeneous microarchitectures
ACM Transactions on Computer Systems (TOCS)
Integrated 3D-stacked server designs for increasing physical density of key-value stores
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
A Host-Based Approach for Unknown Fast-Spreading Worm Detection and Containment
ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special Section on Best Papers from SEAMS 2012
Exploiting replication to improve performances of NUCA-based CMP systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
WCET analysis with MRU cache: Challenging LRU for predictability
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.03 |
The Niagara processor implements a thread-rich architecture designed to provide a high-performance solution for commercial server applications. The hardware supports 32 threads with a memory subsystem consisting of an on-board crossbar, level-2 cache, and memory controllers for a highly integrated design that exploits the thread-level parallelism inherent to server applications, while targeting low levels of power consumption.