A cost effective architecture for vectorizable numerical and multimedia applications

  • Authors:
  • Francisca Quintana;Jesus Corbal;Roger Espasa;Mateo Valero

  • Affiliations:
  • Departamento de Informatica y Sistemas, Universidad de Las Palmas de Gran Canaria, Islas Canarias, Spain;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain

  • Venue:
  • Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper analyzes the performance of vector-dominated regions of code in numerical and multimedia applications in a superscalar+vector architecture and compares it to an 8-way superscalar processor. The ability to split a program's execution into scalar and vector regions allows us to show that (1) as expected, the vector unit is much better than the wide issue superscalar at executing the vector-dominated regions of the code; (2) on the scalar regions, the 8-way superscalar, although better than a 4-way superscalar, is clearly not worth the extra complexity in terms of extra transistors and potential cycle time limitations. Overall, the vector-enhanced superscalar is from 6% to 303% better than an 8-way superscalar. We also present detailed data on the performance of the memory system, which is usually the key limiting factor when running numerical and multimedia applications. We evaluate two additional cache designs that try to alleviate problems created by non-unit stride memory references.