Extracting and improving microarchitecture performance on reconfigurable architectures

  • Authors:
  • Shobana Padmanabhan;Phillip Jones;David V. Schuehler;Scott J. Friedman;Praveen Krishnamurthy;Huakai Zhang;Roger Chamberlain;Ron K. Cytron;Jason Fritts;John W. Lockwood

  • Affiliations:
  • Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO;Department of Computer Science and Engineering, Washington University, St. Louis, MO

  • Venue:
  • International Journal of Parallel Programming - Special issue: The next generation software program
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications for constrained embedded systems require careful attention to the match between the application and the support offered by an architecture, at the ISA and microarchitecture levels. Generic processors, such as ARM and Power PC, are inexpensive, but with respect to a given application, they often overprovision in areas that are unimportant for the application's performance. Moreover, while application-specific, customized logic could dramatically improve the performance of an application, that approach is typically too expensive to justify its cost for most applications. In this paper, we describe our experience using reconfigurable architectures to develop an understanding of an application's performance and to enhance its performance with respect to customized, constrained logic. We begin with a standard ISA currently in use for embedded systems. We modify its core to measure performance characteristics, obtaining a system that provides cycle-accurate timings and presents results in the style of gprof, but with absolutely no software overhead. We then provide cache-behavior statistics that are typically unavailable in a generic processor. In contrast with simulation, our approach executes the program at full speed and delivers statistics based on the actual behavior of the cache subsystem. Finally, in response to the performance profile developed on our platform, we evaluate various uses of the FPGA-realized instruction and data caches in terms of the application's performance.