Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Bidwidth analysis with application to silicon compilation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
C Compiler Design for an Industrial Network Processor
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
ARM Architecture Reference Manual
ARM Architecture Reference Manual
ARM System Architecture
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
A Representation for Bit Section Based Analysis and Optimization
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Data Compression Transformations for Dynamically Allocated Data Structures
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Fast Subword Permutation Instructions Using Omega and Flip Network Stages
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
CommBench-a telecommunications benchmark for network processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Bitwidth aware global register allocation
POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Simple offset assignment in presence of subword data
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Bit level types for high level reasoning
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Runtime resource allocation in multi-core packet processing systems
HPSR'09 Proceedings of the 15th international conference on High Performance Switching and Routing
Optimal bitwise register allocation using integer linear programming
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Speculative subword register allocation in embedded processors
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Enhanced bitwidth-aware register allocation
CC'06 Proceedings of the 15th international conference on Compiler Construction
Hi-index | 0.02 |
Programs that manipulate data at subword level, i.e. bit sections within a word, are common place in the embedded domain. Examples of such applications include media processing as well as network processing codes. These applications spend significant amounts of time packing and unpacking narrow width data into memory words. The execution time and memory overhead of packing and unpacking operations can be greatly reduced by providing direct instruction set support for manipulating bit sections.In this paper we present the Bit Section eXtension (BSX) to the ARM instruction set. We selected the ARM processor for this research because it is one of the most popular embedded processor which is also being used as the basis of building many commercial network processing architectures. We present the design of BSX instructions and their encoding into the ARM instruction set. We have incorporated the implementation of BSX into the Simplescalar ARM simulator from Michigan. Results of experiments with programs from various benchmark suites show that by using BSX instructions the total number of instructions executed at runtime by many transformed functions are reduced by 4.26% to 27.27% and their code sizes are reduced by `1.27% to 21.05%.