The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Application-specific heterogeneous multiprocessor synthesis using differential-evolution
Proceedings of the 11th international symposium on System synthesis
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
From recursion to iteration: what are the optimizations?
PEPM '00 Proceedings of the 2000 ACM SIGPLAN workshop on Partial evaluation and semantics-based program manipulation
Notes on recursion elimination
Communications of the ACM
Proceedings of the 38th annual Design Automation Conference
Structure of Computers and Computations
Structure of Computers and Computations
Short Vector Code Generation for the Discrete Fourier Transform
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
The Software Optimization Cookbook
The Software Optimization Cookbook
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
The future of multiprocessor systems-on-chips
Proceedings of the 41st annual Design Automation Conference
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Highly-cited ideas in system codesign and synthesis
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
On the exploitation of loop-level parallelism in embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
Microprocessors & Microsystems
Hi-index | 0.00 |
Embedded processors have been increasingly exploiting hardware parallelism. Vector units, multiple processors or cores, hyper-threading, special-purpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. How this hardware parallelism can be exploited by applications is directly related to the amount of parallelism inherent in a target application. In this paper we evaluate the performance potential of different types of parallelism, viz., true thread-level parallelism, speculative thread-level parallelism and vector parallelism, when executing loops. Applications from the industry-standard EEMBC 1.1, EEMBC 2.0 and the MiBench embedded benchmark suites are analyzed using the Intel C compiler. The results show what can be achieved today, provide upper bounds on the performance potential of different types of thread parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution.