An overview of the ECO project

  • Authors:
  • Jacqueline Chame;Chun Chen;Pedro Diniz;Mary Hall;Yoon-Ju Lee;Robert F. Lucas

  • Affiliations:
  • University of Southern California, Information Science Institute, Marina del Rey, CA;University of Southern California, Information Science Institute, Marina del Rey, CA;University of Southern California, Information Science Institute, Marina del Rey, CA;University of Southern California, Information Science Institute, Marina del Rey, CA;University of Southern California, Information Science Institute, Marina del Rey, CA;University of Southern California, Information Science Institute, Marina del Rey, CA

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in high performance. Our approach combines compiler models and heuristics with guided empirical search to take advantage of their complementary strengths. The models and heuristics limit the search to a small number of candidate implementations and the empirical results provide the most accurate information to the compiler to select among candidates and tune optimization parameter values. The overall approach can be employed to alleviate some of the performance problems that lead to inefficiencies in key applications today: register pressure, cache conflict misses, and the trade-off between synchronization parallelism and locality in SMPs. The main focus of the paper is an algorithm for simultaneously optimizing across multiple levels of the memory hierarchy for dense-matrix computations. We have developed an initial compiler implementation, and present automatically-generated results on matrix multiply. Results on two architectures SGI R10000 and Sun UltraSparc IIe, outperform the native compiler, and either outperform or achieve comparable performance as the ATLAS self-tuning library and the hand-tuned vendor BLAS library. This paper describes other components of the ECO system, including supporting tools and experiments with programmer-guided performance tuning. This approach has provided a foundation for a general framework for systematic optimization of domain-specific applications. Specifically, we are developing an optimization system for signal and image processing that exploits signal properities, and we are using machine learning and a knowledge-rich representation can be exploited to optimize molecular dynamics simulation.