On the exploitation of loop-level parallelism in embedded applications

Authors:
Arun Kejariwal;Alexander V. Veidenbaum;Alexandru Nicolau;Milind Girkar;Xinmin Tian;Hideki Saito
Affiliations:
University of California, Irvine, CA, USA;University of California, Irvine, CA, USA;University of California, Irvine, CA, USA;Intel Corporation;Intel Corporation;Intel Corporation
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2009

Citing 55
Cited 2

The concurrent C programming language

The concurrent C programming language
Region Scheduling: An Approach for Detecting and Redistributing Parallelism

IEEE Transactions on Software Engineering
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
DISC: dynamic instruction stream computer

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Components, frameworks, patterns

Proceedings of the 1997 symposium on Software reusability
Region-based compilation: introduction, motivation, and initial experience

International Journal of Parallel Programming - Special issue on instruction-level parallel processing—part I
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Advanced compiler design and implementation

Advanced compiler design and implementation
Application-specific heterogeneous multiprocessor synthesis using differential-evolution

Proceedings of the 11th international symposium on System synthesis
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
From recursion to iteration: what are the optimizations?

PEPM '00 Proceedings of the 2000 ACM SIGPLAN workshop on Partial evaluation and semantics-based program manipulation
Notes on recursion elimination

Communications of the ACM
Automatic generation and targeting of application specific operating systems and embedded systems software

Proceedings of the conference on Design, automation and test in Europe
Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip

Proceedings of the 38th annual Design Automation Conference
Real-Time Systems: Design Principles for Distributed Embedded Applications

Real-Time Systems: Design Principles for Distributed Embedded Applications
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
MPI: The Complete Reference

MPI: The Complete Reference
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Structure of Computers and Computations

Structure of Computers and Computations
COOL: An Object-Based Language for Parallel Programming

Computer
Shared Memory Consistency Models: A Tutorial

Computer
Data-Parallel Programming on MIMD Computers

IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
From patterns to frameworks to parallel programs

Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Short Vector Code Generation for the Discrete Fourier Transform

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
TriMedia CPU64 Architecture

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
The Software Optimization Cookbook

The Software Optimization Cookbook
CC++: A Declarative Concurrent Object Oriented Programming Notation

CC++: A Declarative Concurrent Object Oriented Programming Notation
eMPI/eMPICH: Embedding MPI

MPIDC '96 Proceedings of the Second MPI Developers Conference
An Introduction to Parallel Object-Oriented Programming with Mentat

An Introduction to Parallel Object-Oriented Programming with Mentat
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
The future of multiprocessor systems-on-chips

Proceedings of the 41st annual Design Automation Conference
Helper Threads via Virtual Multithreading

IEEE Micro
Synthesis of Application-Specific Heterogeneous Multiprocessor Architectures Using Extensible Processors

VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An integrated simdization framework using virtual vectors

Proceedings of the 19th annual international conference on Supercomputing
Software and the Concurrency Revolution

Queue - Multiprocessors
Optimizing data permutations for SIMD devices

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
LMPI: MPI for Heterogeneous Embedded Distributed Systems

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Challenges in exploitation of loop parallelism in embedded applications

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Proceedings of the 20th annual international conference on Supercomputing
Concert/C: a language for distributed programming

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference

SoC-TM: integrated HW/SW support for transactional memory programming on embedded MPSoCs

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
mTags: augmenting microkernel messages with lightweight metadata

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in the silicon technology have enabled increasing support for hardware parallelism in embedded processors. Vector units, multiple processors/cores, multithreading, special-purpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. To what extent the available hardware parallelism can be exploited is directly dependent on the amount of parallelism inherent in the given application and the congruence between the granularity of hardware and application parallelism. This paper discusses how loop-level parallelism in embedded applications can be exploited in hardware and software. Specifically, it evaluates the efficacy of automatic loop parallelization and the performance potential of different types of parallelism, viz., true thread-level parallelism (TLP), speculative thread-level parallelism and vector parallelism, when executing loops. Additionally, it discusses the interaction between parallelization and vectorization. Applications from both the industry-standard EEMBC®,1 1.1, EEMBC 2.0 and the academic MiBench embedded benchmark suites are analyzed using the Intel®2 C compiler. The results show the performance that can be achieved today on real hardware and using a production compiler, provide upper bounds on the performance potential of the different types of thread-level parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution. 1 Other names and brands may be claimed as the property of others. 2 Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.