MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
ACM Computing Surveys (CSUR)
Optimal chaining in expression trees
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
On the use of registers vs. cache to minimize memory traffic
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Fast temporary storage for serial and parallel execution
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Organization and analysis of a gracefully-degrading interleaved memory system
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Matrix operations on a multicomputer system with switchable main memory modules and dynamic control
IEEE Transactions on Computers
Scheduling arithmetic and load operations in parallel with no spilling
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The IBM System/370 Vector Architecture: Design Considerations
IEEE Transactions on Computers
Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
Continuous Models for Communication Density Constraints on Multiprocessor Performance
IEEE Transactions on Computers
Optimal Chaining in Expression Trees
IEEE Transactions on Computers
Toward a dataflow/von Neumann hybrid architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Cache performance of vector processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
IEEE Transactions on Computers
A two-tier memory architecture for high-performance multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compiling Fortran 8x array features for the connection machine computer system
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
I-NET mechanism for issuing multiple instructions
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A Historical Overview of Computer Architecture
IEEE Annals of the History of Computing
Tradeoffs in instruction format design for horizontal architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Programming languages for distributed computing systems
ACM Computing Surveys (CSUR)
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
On the Complexity of Scheduling Problems for Parallel/Pipelined Machines
IEEE Transactions on Computers
IEEE Transactions on Computers
Quick and easy cache performance analysis
ACM SIGARCH Computer Architecture News
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Performance from architecture: comparing a RISC and a CISC with similar hardware organization
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM Transactions on Programming Languages and Systems (TOPLAS)
(SM)/sup 2/-II: A Large-Scale Multiprocessor for Sparse Matrix Calculations
IEEE Transactions on Computers
SPIRE: streaming processing with instructions release element
ACM SIGARCH Computer Architecture News
Tolerating data access latency with register preloading
ICS '92 Proceedings of the 6th international conference on Supercomputing
ICS '92 Proceedings of the 6th international conference on Supercomputing
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
IEEE Transactions on Computers
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
Register connection: a new approach to adding registers into instruction set architectures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Social Limits of Speed: The Development and Use of Supercomputers
IEEE Annals of the History of Computing
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Data relocation and prefetching for programs with large data sets
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effects of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Decoupling integer execution in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
An evaluation of functional unit lengths for single-chip processors
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A victim cache for vector registers
ICS '97 Proceedings of the 11th international conference on Supercomputing
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
Implementation of precise interrupts in pipelined processors
25 years of the international symposia on Computer architecture (selected papers)
Instruction issue logic for high-performance, interruptable pipelined processors
25 years of the international symposia on Computer architecture (selected papers)
The University of Manchester MU5 Project
IEEE Annals of the History of Computing
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Parallel instance discrete-event simulation using a vector uniprocessor
WSC '91 Proceedings of the 23rd conference on Winter simulation
Tutorial on parallel processing for design automation applications (tutorial session)
DAC '86 Proceedings of the 23rd ACM/IEEE Design Automation Conference
Response Time Analysis of Multiprocessor Computers for Database Support
ACM Transactions on Database Systems (TODS)
Optimal code generation for expressions on super scalar machines
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
(SM)2-II: a new version of the sparse matrix solving machine
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Tagged architecture: how compelling are its advantages?
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
MOM: a matrix SIMD instruction set architecture for multimedia applications
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Polygon rendering on a stream architecture
HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
A Language for Array and Vector Processors
ACM Transactions on Programming Languages and Systems (TOPLAS)
PACS: a parallel microprocessor array for scientific calculations
ACM Transactions on Computer Systems (TOCS)
Speeding up an overrelaxation method of division in Radix-2n machine
Communications of the ACM
An overrelaxation for a numerical inverse of a constant
Communications of the ACM
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Automatic Intra-Register Vectorization for the Intel® Architecture
International Journal of Parallel Programming
Imagine: Media Processing with Streams
IEEE Micro
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study
IEEE Transactions on Computers
Interrupt Handling for Out-of-Order Execution Processors
IEEE Transactions on Computers
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Process exchange on the PR1ME family of computers
ACM SIGARCH Computer Architecture News
A proposed high-speed computer design
ACM SIGARCH Computer Architecture News
Numerical Weather Prediction on the Supercomputer Toolkit
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Parallel ray tracing on a chip
Practical parallel rendering
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
VLSI Architecture: Past, Present, and Future
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
The Piecewise Data Flow architecture control flow and register management
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Analysis of Cray-1S architecture
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
(SM)2-Sparse Matrix Solving Machine
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A data flow processor array system: Design and analysis
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Distributed communicating media-a multitrack bus-capable of concurrent data exchanging
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Online pipeline systems for recursive numeric computations
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
On-line algorithms for the design of pipeline architectures
ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
FLATS, a machine for numerical, symbolic and associative computing
ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
A pipelined processing unit for on-line division
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
The effects of CPU: I/O overlap on computer system configurations
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Reconfigurable Pipeline Systems
ACM '78 Proceedings of the 1978 annual conference
An extensible architecture for data flow processing
CAW '78 Proceedings of the fourth workshop on Computer architecture for non-numeric processing
Instruction issue logic for pipelined supercomputers
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Supercomputers: Challenges to designers and users
ACM '82 Proceedings of the ACM '82 conference
Instruction reference patterns in data flow programs
ACM '80 Proceedings of the ACM 1980 annual conference
Multiprocessor hardware: An architectural overview
ACM '80 Proceedings of the ACM 1980 annual conference
Multiprocessor software design
ACM '80 Proceedings of the ACM 1980 annual conference
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Memory access reordering in vector processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Programmable Stream Processors
Computer
A method for controlling parallelism in programming languages
ACM SIGPLAN Notices
The Reconfigurable Streaming Vector Processor (RSVPTM)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Matrix bidiagonalization: implementation and evaluation on the Trident processor
Neural, Parallel & Scientific Computations
An extensible architecture for data flow processing
ACM SIGARCH Computer Architecture News
ACM SIGPLAN Notices
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Vector-Thread Architecture
IEEE Micro
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hardware system of the earth simulator
Parallel Computing
Implementing virtual memory in a vector processor with software restart markers
Proceedings of the 20th annual international conference on Supercomputing
The design space of data-parallel memory systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
MSYM'93 Proceedings of the 3rd conference on USENIX MACH III Symposium - Volume 1
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Design Considerations for Single-Chip Computers of the Future
IEEE Transactions on Computers
The Piecewise Data Flow Architecture: Architectural Concepts
IEEE Transactions on Computers
Supersystems: Current State-of-the-Art Guest Editor's Introduction
IEEE Transactions on Computers
Technology and Design Tradeoffs in the Creation of a Modern Supercomputer
IEEE Transactions on Computers
The Burroughs Scientific Processor (BSP)
IEEE Transactions on Computers
Derivation and Calibration of a Transient Error Reliability Model
IEEE Transactions on Computers
Instruction Issue Logic in Pipelined Supercomputers
IEEE Transactions on Computers
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Adaptable pipeline system with dynamic architecture
AFIPS '81 Proceedings of the May 4-7, 1981, national computer conference
Architectures for supersystems of the '80s
AFIPS '80 Proceedings of the May 19-22, 1980, national computer conference
Numerical algorithms for parallel computers
AFIPS '80 Proceedings of the May 19-22, 1980, national computer conference
Applying Data Mapping Techniques to Vector DSPs
Journal of Signal Processing Systems
Flats, a machine for numerical, symbolic and associative computing
IJCAI'79 Proceedings of the 6th international joint conference on Artificial intelligence - Volume 2
The IBM System/370 vector architecture
IBM Systems Journal
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
Performance evaluation of vector implementations of combinatorial algorithms
Parallel Computing
Performance evaluation of scientific applications on modern parallel vector systems
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Evaluating the performance of space plasma simulations using FPGA's
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures
Communications of the ACM
An Instruction Fetch Unit for a High-Performance Personal Computer
IEEE Transactions on Computers
VEGAS: soft vector processor with scratchpad memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A compile-time managed multi-level register file hierarchy
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A vector approach to cryptography implementation
DRMTICS'05 Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Generalizing matrix multiplication for efficient computations on modern computers
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
Proceedings of the 40th Annual International Symposium on Computer Architecture
Portable, flexible, and scalable soft vector processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Hybrid type legalization for a sparse SIMD instruction set
ACM Transactions on Architecture and Code Optimization (TACO)
Soft vector processors with streaming pipelines
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.13 |
This paper describes the CRAY-1, discusses the evolution of its architecture, and gives an account of some of the problems that were overcome during its manufacture.The CRAY-1 is the only computer to have been built to date that satisfies ERDA's Class VI requirement (a computer capable of processing from 20 to 60 million floating point operations per second) [1].The CRAY-1's Fortran compiler (CFT) is designed to give the scientific user immediate access to the benefits of the CRAY-1's vector processing architecture. An optimizing compiler, CFT, “vectorizes” innermost DO loops. Compatible with the ANSI 1966 Fortran Standard and with many commonly supported Fortran extensions, CFT does not require any source program modifications or the use of additional nonstandard Fortran statements to achieve vectorization. Thus the user's investment of hundreds of man months of effort to develop Fortran programs for other contemporary computers is protected.