Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Static synchronization beyond VLIW
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Employing register channels for the exploitation of instruction level parallelism
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Advanced compiler design and implementation
Advanced compiler design and implementation
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Multilevel algorithms for multi-constraint graph partitioning
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
SIMD Extension to VLIW Multicluster Processors for Embedded Applications
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A holistic methodology for network processor design
LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
Network Application Driven Instruction Set Extensions for Embedded Processing Clusters
PARELEC '04 Proceedings of the international conference on Parallel Computing in Electrical Engineering
Architecture Exploration for a Reconfigurable Architecture Template
IEEE Design & Test
Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Resource efficiency of the GigaNetIC chip multiprocessor architecture
Journal of Systems Architecture: the EUROMICRO Journal
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Power aware reconfigurable multiprocessor for elliptic curve cryptography
Proceedings of the conference on Design, automation and test in Europe
GigaNetIC – a scalable embedded on-chip multiprocessor architecture for network applications
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this article, a unified hardware-software approach for the design of our QuadroCore is presented. This design flow is enabled via compiler-driven reconfiguration which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle. The QuadroCore has been implemented on Xilinx XC2V6000 for functional validation and on UMC’s 90nm standard cell technology for performance estimation. A diverse set of applications have been mapped onto the reconfigurable multiprocessor to meet orthogonal performance characteristics in terms of time and power. Speedup measurements show a 2--11 times performance increase in comparison to a single processor. Additionally, the reconfiguration scheme has been applied to save power in data-parallel applications. Gate-level simulations have been performed to measure the power-performance trade-offs for two computationally complex applications. The power reports confirm that introducing this scheme of reconfiguration results in power savings in the range of 15--24%.