Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 39th annual Design Automation Conference
Runtime Code Parallelization for On-Chip Multiprocessors
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 20th annual international conference on Supercomputing
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Global Multi-Threaded Instruction Scheduling
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A novel migration-based NUCA design for chip multiprocessors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Data partitioning on chip multiprocessors
Proceedings of the 4th international workshop on Data management on new hardware
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Load balancing using work-stealing for pipeline parallelism in emerging applications
Proceedings of the 23rd international conference on Supercomputing
Core-Selectability in Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
A case for dynamic frequency tuning in on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
With the development of semi-conductor industry, more transistors can be integrated onto a single chip. But the software programming model cannot fit the parallelism requirement of CMP (Chip Multi Processor) based architecture. The communication between different cores becomes a very serious problem, and it made bad effectiveness on performance. This paper proposes an approach called API (Architecture of Parallelism on Instructions) which can scan the source code of the programs, analyze the data dependency, and cluster retentive instructions together. The instructions without dependency can be issued directly in parallel by different cores. API provides a global register file for the effective execution of the programs on CMP chips. We have also evaluated the time consuming comparison between API and the traditional architecture in our experiments by using SPEC benchmark CPU2000. The experimental results show that the instruction clock in API is only 49 percent of original instruction clocks. Moreover, there only need 4 cores to approach the best performance.