The Tera computer system

Authors:
Robert Alverson;David Callahan;Daniel Cummings;Brian Koblenz;Allan Porterfield;Burton Smith
Affiliations:
Tera Computer Company, Seattle, Washington;Tera Computer Company, Seattle, Washington;Tera Computer Company, Seattle, Washington;Tera Computer Company, Seattle, Washington;Tera Computer Company, Seattle, Washington;Tera Computer Company, Seattle, Washington
Venue:
ICS '90 Proceedings of the 4th international conference on Supercomputing
Year:
1990

Citing 4
Cited 233

The horizon supercomputing system: architecture and software

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A processor architecture for horizon

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Analysis of a 3D toroidal network for a shared memory architecture

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Software for Doubled-Precision Floating-Point Computations

ACM Transactions on Mathematical Software (TOMS)

An overview of supertoroidal networks

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Reducing memory contention in shared memory multiprocessors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Executing DSP Applications in a Fine-Grained Dataflow Environment

IEEE Transactions on Software Engineering
T: a multithreaded massively parallel architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Specifying non-blocking shared memories (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Automatic software cache coherence through vectorization

ICS '92 Proceedings of the 6th international conference on Supercomputing
Exploiting heterogeneous parallelism on a multithreaded multiprocessor

ICS '92 Proceedings of the 6th international conference on Supercomputing
Manchester data-flow: a progress report

ICS '92 Proceedings of the 6th international conference on Supercomputing
Dynamic object management for distributed data structures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Multithreaded computer systems

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of adaptive wormhole routing algorithms

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Designing interconnection networks for multi-level packaging

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiling for shared-memory and message-passing computers

ACM Letters on Programming Languages and Systems (LOPLAS)
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On testing cache-coherent shared memories

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Fault-tolerant wormhole routing in tori

ICS '94 Proceedings of the 8th international conference on Supercomputing
Increasing network bandwidth on meshes

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Programming, compilation, and resource management issues for multithreading (panel session II)

ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Design and implementation of a prototype optical deflection network

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
A comparison of message passing and shared memory architectures for data parallel programs

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Impact of sharing-based thread placement on multithreaded architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hardware and software support for efficient exception handling

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hardware support for fast capability-based addressing

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The effectiveness of multiple hardware contexts

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Corpus-based static branch prediction

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimistic active messages: a mechanism for scheduling communication with computation

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Universal congestion control for meshes

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
On characterizing bandwidth requirements of parallel applications

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance evaluation of a parallel I/O architecture

ICS '95 Proceedings of the 9th international conference on Supercomputing
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A design study of the EARTH multiprocessor

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Analysis of communications and overhead reduction in multithreaded execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Control of loop parallelism in multithreaded code

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A partitioning-independent paradigm for nested data parallelism

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
A Framework for Designing Deadlock-Free Wormhole Routing Algorithms

IEEE Transactions on Parallel and Distributed Systems
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Limits on the performance benefits of multithreading and prefetching

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Whole-program optimization for time and space efficient threads

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques

ICS '96 Proceedings of the 10th international conference on Supercomputing
Evidence-based static branch prediction using machine learning

ACM Transactions on Programming Languages and Systems (TOPLAS)
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Multithreading with Distributed Functional Units

IEEE Transactions on Computers
An evaluation of bottom-up and top-down thread generation techniques

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Performance Analysis of Buffering Schemes in Wormhole Routers

IEEE Transactions on Computers
Triplex: a multi-class routing algorithm

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Fine-grain multithreading with the EM-X multiprocessor

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Thread partitioning and scheduling based on cost model

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
An Efficient Task Allocation Scheme for 2D Mesh Architectures

IEEE Transactions on Parallel and Distributed Systems
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Job Scheduling in Mesh Multicomputers

IEEE Transactions on Parallel and Distributed Systems
On Submesh Allocation for Mesh Multicomputers: A Best-Fit Allocation and a Virtual Submesh Allocation for Faulty Meshes

IEEE Transactions on Parallel and Distributed Systems
How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm?

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A lower bound on the local time complexity of universal constructions

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
MBCF: a protected and virtualized high-speed user-level memory-based communication facility

ICS '98 Proceedings of the 12th international conference on Supercomputing
Support for Efficient Programming on the SB-PRAM

International Journal of Parallel Programming
Informing memory operations: memory performance feedback mechanisms and their applications

ACM Transactions on Computer Systems (TOCS)
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Proceedings of the 25th annual international symposium on Computer architecture
A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Retrospective: a preliminary architecture for a basic data flow processor

25 years of the international symposia on Computer architecture (selected papers)
Virtual memory mapped network interface for the SHRIMP multicomputer

25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Effects of Multithreading on Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements

IEEE Transactions on Parallel and Distributed Systems
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
An Efficient Submesh Allocation Scheme for Two-Dimensional Meshes with Little Overhead

IEEE Transactions on Parallel and Distributed Systems
A new “quad-tree-based” sub-system allocation technique for mesh-connected parallel machines

ICS '99 Proceedings of the 13th international conference on Supercomputing
The QRQW PRAM: accounting for contention in parallel algorithms

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Concurrent Event Handling through Multithreading

IEEE Transactions on Computers
Instruction fetch mechanisms for multipath execution processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Design Alternatives of Multithreaded Architecture

International Journal of Parallel Programming
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating titanium SPMD programs on the Tera MTA

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A study of common pitfalls in simple multi-threaded programs

Proceedings of the thirty-first SIGCSE technical symposium on Computer science education
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks

IEEE Transactions on Computers
Automatic compiler techniques for thread coarsening for multithreaded architectures

Proceedings of the 14th international conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Symbiotic jobscheduling for a simultaneous mutlithreading processor

ACM SIGPLAN Notices
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A Unified Formulation of Honeycomb and Diamond Networks

IEEE Transactions on Parallel and Distributed Systems
α-coral: a multigrain, multithreaded processor architecture

ICS '01 Proceedings of the 15th international conference on Supercomputing
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Sisal project: real world functional programming

Compiler optimizations for scalable parallel systems
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture

Compiler optimizations for scalable parallel systems
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Fine-Grained Multithreading with Process Calculi

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
The Impulse Memory Controller

IEEE Transactions on Computers
Integrated Network Barriers

IEEE Transactions on Parallel and Distributed Systems
A Fast and Efficient Processor Allocation Scheme for Mesh-Connected Multicomputers

IEEE Transactions on Computers
Tera hardware-software cooperation

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Optimal organizations for pipelined hierarchical memories

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Ray tracing on programmable graphics hardware

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Context-based compression of binary images in parallel

Software—Practice & Experience
Enhancing Functional and Irregular Parallelism: Stateful Functions and their Semantics

International Journal of Parallel Programming
Post-placement C-slow retiming for the xilinx virtex FPGA

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel I/O Subsystems in Massively Parallel Supercomputers

IEEE Parallel & Distributed Technology: Systems & Technology
Crash Analysis on the Tera MTA

IEEE Computational Science & Engineering
Cache-Only Memory Architectures

Computer
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
The Impact of Pipelined Channels on k-ary n-Cube Networks

IEEE Transactions on Parallel and Distributed Systems
Allocating Precise Submeshes in Mesh Connected Systems

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Work-optimal simulation of PRAM models on meshes

Nordic Journal of Computing
Return-Address Prediction in Speculative Multithreaded Environments

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
A Class of Fixed-Degree Cayley-Graph Interconnection Networks Derived by Pruning k-ary n-cubes

ICPP '97 Proceedings of the international Conference on Parallel Processing
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Limits of Task-Based Parallelism in Irregular Applications

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Performance of MP3D on the SB-PRAM Prototype (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Real PRAM Programming

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Highly Concurrent Locking in Shared Memory Database Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
An Evaluation of Optimized Threaded Code Generation

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Two Fundamental Limits on Dataflow Multiprocessing

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Memory System Support for Irregular Applications

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Processor Allocation in the Mesh Multiprocessors Using the Leapfrog Method

IEEE Transactions on Parallel and Distributed Systems
Hyperthreading Technology in the Netburst Microarchitecture

IEEE Micro
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelizing a DNA Simulation Code for the Cray MTA-2

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Efficient and balanced adaptive routing in two-dimensional meshes

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Modeling virtual channel flow control in hypercubes

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Measurement and Modeling of EARTH-MANNA Multithreaded Architecture

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
An Implementation of the SSF Scalable Simulation Framework on the Cray MTA

Proceedings of the seventeenth workshop on Parallel and distributed simulation
The Sisal Model of Functional Programming and its Implementation

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Timed Petri net models of multithreaded multiprocessor architectures

PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
Power-Sensitive Multithreaded Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Simultaneous Multithreading-Based Routers

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Controlling the data space of tree structured computations

Information and Computation
Incomplete k-ary n-cube and its derivatives

Journal of Parallel and Distributed Computing
Balanced scheduling: instruction scheduling when memory latency is uncertain

ACM SIGPLAN Notices - Best of PLDI 1979-1999
On fault tolerance of 3-dimensional mesh networks

Journal of Computer Science and Technology
Task migration in n-dimensional wormhole-routed mesh multicomputers

Journal of Systems Architecture: the EUROMICRO Journal
Performance and modularity benefits of message-driven execution

Journal of Parallel and Distributed Computing
Safely exploiting multithreaded processors to tolerate memory latency in real-time systems

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Early Experience with Scientific Programs on the Cray MTA-2

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
ELDORADO

Proceedings of the 2nd conference on Computing frontiers
Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures

Proceedings of the 19th annual international conference on Supercomputing
The Future of Microprocessors

Queue - Multiprocessors
Chip multithreading systems need a new operating system scheduler

Proceedings of the 11th workshop on ACM SIGOPS European workshop
A Low-Power Multithreaded Processor for Software Defined Radio

Journal of VLSI Signal Processing Systems
HeapMon: a helper-thread approach to programmable, automatic, and low-overhead memory bug detection

IBM Journal of Research and Development
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Ultra low-cost defect protection for microprocessor pipelines

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Real-time rendering systems in 2010

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Ray tracing on programmable graphics hardware

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
A comparison of the effect of branch prediction on multithreaded and scalar architectures

ACM SIGARCH Computer Architecture News
Probabilistic analysis on mesh network fault tolerance

Journal of Parallel and Distributed Computing
The design and development of ZPL

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Impulse: Memory system support for scientific applications

Scientific Programming
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Proceedings of the 34th annual international symposium on Computer architecture
Multithreaded architecture for multimedia processing

Integrated Computer-Aided Engineering
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Efficient implementation of constant coefficient division under quantization constraints

ICC'05 Proceedings of the 9th International Conference on Circuits
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing

Proceedings of the 45th annual Design Automation Conference
Assessing Programming Costs of Explicit Memory Localization on a Large Scale Shared Memory Multiprocessor

Scientific Programming
Design and performance evaluation of combined first-fit task allocation and migration strategies in mesh multiprocessor systems

Parallel Computing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Optimizing Memory Access Latencies on a Reconfigurable Multimedia Accelerator: A Case of a Turbo Product Codes Decoder

ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
On approximating the ideal random access machine by physical machines

Journal of the ACM (JACM)
A case for bufferless routing in on-chip networks

Proceedings of the 36th annual international symposium on Computer architecture
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Configurable emulated shared memory architecture for general purpose MP-SOCs and NOC regions

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
High Performance Matrix Multiplication on Many Cores

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Hybrid multithreading for VLIW processors

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
A multithreaded PowerPC processor for commercial servers

IBM Journal of Research and Development
Transient blocking synchronization

Transient blocking synchronization
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Lower bounds on the connectivity probability for 2-D mesh networks

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
MIPS MT: a multithreaded RISC architecture for embedded real-time processing

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Algorithmic approach to designing an easy-to-program system: Can it lead to a HW-enhanced programmer's workflow add-on?

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Inter-task communication via overlapping read and write windows for deadlock-free execution of cyclic task graphs

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
A case for FAME: FPGA architecture model execution

Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures

Communications of the ACM
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Improving SMT performance scheduling processes

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Chip-size evaluation of a multithreaded processor enhanced with a PID controller

SEUS'10 Proceedings of the 8th IFIP WG 10.2 international conference on Software technologies for embedded and ubiquitous systems
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Flow Control for Robust Performance and Energy

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Landing stencil code on Godson-T

Journal of Computer Science and Technology
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

Proceedings of the international conference on Supercomputing
Energy-efficient mechanisms for managing thread context in throughput processors

Proceedings of the 38th annual international symposium on Computer architecture
A study on factors influencing power consumption in multithreaded and multicore CPUs

WSEAS Transactions on Computers
Crunching large graphs with commodity processors

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Introducing mNUMA: an extended PGAS architecture

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Reassortment Networks and the Evolution of Pandemic H1N1 Swine-Origin Influenza

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Exploring irregular memory accesses on FPGAs

Proceedings of the first workshop on Irregular applications: architectures and algorithm
Static partitioning vs dynamic sharing of resources in simultaneous multithreading microarchitectures

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Upper bounds on the connection probability for 2-D meshes and tori

Journal of Parallel and Distributed Computing
Fault tolerance analysis of mesh networks with uniform versus nonuniform node failure probability

Information Processing Letters
Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and data-level parallelism

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
SuperCoP: a general, correct, and performance-efficient supervised memory system

Proceedings of the 9th conference on Computing Frontiers
Comparing four classes of torus-based parallel architectures: Networkparameters and communication performance

Mathematical and Computer Modelling: An International Journal
Extendable pattern-oriented optimization directives

ACM Transactions on Architecture and Code Optimization (TACO)
Support for fine-grained synchronization in shared-memory multiprocessors

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Parallel solution of the subset-sum problem: an empirical study

Concurrency and Computation: Practice & Experience
Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
Compiled multithreaded data paths on FPGAs for dynamic workloads

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A memory access model for highly-threaded many-core architectures

Future Generation Computer Systems

Quantified Score

Hi-index	0.03

The Tera computer system

Quantified Score

Visualization

Abstract