Toward a dataflow/von Neumann hybrid architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A processor architecture for horizon
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Boosting beyond static scheduling in a superscalar processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Fortran compiler for the FPS-164 scientific computer
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Exploiting instruction-level parallelism: the multithreaded approach
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Ordered multithreading: a novel technique for exploiting thread-level parallelism
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving single-process performance with multithreaded processors
ICS '96 Proceedings of the 10th international conference on Supercomputing
Multithreading with Distributed Functional Units
IEEE Transactions on Computers
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Proceedings of the 1999 ACM symposium on Applied computing
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Compiler Techniques for the Superthreaded Architectures
International Journal of Parallel Programming
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Design Alternatives of Multithreaded Architecture
International Journal of Parallel Programming
Symbiotic jobscheduling for a simultaneous mutlithreading processor
ACM SIGPLAN Notices
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Amir Roth: Speculative Multithreaded Processors
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Analytic Performance Modeling for a Spectrum of Multithreaded Processor Architectures
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Microprocessors - 10 Years Back, 10 Years Ahead
Informatics - 10 Years Back. 10 Years Ahead.
The effects of STEF in finely parallel multithreaded processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Timed Petri net models of multithreaded multiprocessor architectures
PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
The need for adaptive dynamic thread scheduling
High performance scientific and engineering computing
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Architecture optimization for multimedia application exploiting data and thread-level parallelism
Journal of Systems Architecture: the EUROMICRO Journal
Evaluating the impact of simultaneous multithreading on network servers using real hardware
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Journal of Parallel and Distributed Computing
Design of adaptive multiprocessor on chip systems
Proceedings of the 20th annual conference on Integrated circuits and systems design
International Journal of High Performance Computing and Networking
An adaptive resource partitioning algorithm for SMT processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Improving SMT performance scheduling processes
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Low power microprocessor design for embedded systems
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
In this paper, we propose a multithreaded processor architecture which improves machine throughput. In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there are functional unit conflicts. This parallel execution scheme greatly improves the utilization of the functional unit. Simulation results show that by executing two and four threads in parallel on a nine-functional-unit processor, a 2.02 and a 3.72 times speed-up, respectively, can be achieved over a conventional RISC processor.Our architecture is also applicable to the efficient execution of a single loop. In order to control functional unit conflicts between loop iterations, we have developed a new static code scheduling technique. Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.