Dynamic code footprint optimization for the IBM Cell Broadband Engine

  • Authors:
  • Tobias Werth;Tobias Flossmann;Michael Klemm;Dominic Schell;Ulrich Weigand;Michael Philippsen

  • Affiliations:
  • University of Erlangen-Nuremberg, Germany, Computer Science Department, Programming Systems Group, Martensstr. 3, 91058;University of Erlangen-Nuremberg, Germany, Computer Science Department, Programming Systems Group, Martensstr. 3, 91058;University of Erlangen-Nuremberg, Germany, Computer Science Department, Programming Systems Group, Martensstr. 3, 91058;University of Erlangen-Nuremberg, Germany, Computer Science Department, Programming Systems Group, Martensstr. 3, 91058;IBM Deutschland, Research&Development GmbH, Linux on Cell B. E. Development, Schöönaicher Str. 220, 71032 Böblingen, Germany;University of Erlangen-Nuremberg, Germany, Computer Science Department, Programming Systems Group, Martensstr. 3, 91058

  • Venue:
  • IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore designers often add a small local memory close to each core to speed up access and to reduce off-chip IO. But this approach puts a burden on the programmer, the compiler, and the runtime system, since this memory lacks hardware support (cache logic, MMU, …) and hence needs to be managed in software to exploit its performance potential. The IBM Cell Broadband Engine (Cell B. E.) is extreme in this respect, since each of the parallel cores can only address code and data in its own local memory directly. Overlay techniques from the 70ies solve this problem with the well-known drawbacks: The programmer must manually divide the program into overlays and the largest overlay determines how much data the application can work with. In our approach, programmers do no longer need to cut overlays. Instead, we automatically and at runtime fragment and load small code snippets into a code cache located in the local stores and supervised by a garbage collector. Since our loader does not load code that is not needed for execution, the code cache can be much smaller (up to 70%) than the original program size. Applications can therefore work on larger data sets, i. e., bigger problems. Our loader is highly efficient and slows down applications by less than 5% on average. It can load any native code without pre-processing or changes in the software tool chain.