A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Exploiting Image Processing Locality in Cache Pre-fetching
HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architectures
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Exploiting Cache in Multimedia
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Data cache management on EPIC architecture: optimizing memory access for image processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Hi-index | 0.00 |
This research aims to explore possible solutions to improvement of performance in multimedia processor [1]. In this context, cache memory performance plays a more and more critical role in computer systems, since the gap between processor speed and main memory speed tends to increase rather than the contrary. The integration inside the computational units of some SIMD improvements (such as Pentium MMX, HP MAX2 or UltraSparc VIS) for improving the parallel computation on image pixels is the main answer to the heavy workloads of multimedia applications [2]. Moreover, the workload of multimedia applications [3] has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In fact, as widely known, programs exhibit two main kind of locality: spatial and temporal. Nevertheless, as stated in [1], multimedia applications seem to present a new kind of locality, called 2D-spatial locality (i.e. there is an high probability that accessing to an address, future accesses will be in a bidimensional neighborhood of it). For this reason, standard cache memory organization achieves poorer performance when used for multimedia. To achieve an overall performance improvement on specialized multimedia processors, further architectural modification on memory hierarchy and on its management should be fulfilled. This could be coupled with the recent idea of associating programmable components with memory separated from the main processor, such as IRAM [4].First goal of this research is to prove that common multimedia applications exhibit a 2D-spatial locality. To do this, we developed a benchmark including the most common multimedia and image processing applications. Many trace-driven simulations confirm the hypothesis [5][6].After this, we try to explore techniques able to exploit this locality to improve cache performance. Among the various techniques used to improve cache memory performance, prefetching has been one of the most studied and apparently promising (see [7][8], where, however, no assumption on 2D spatial locality is highlighted). Prefetching techniques can be mainly classified according to their potential software or hardware implementation, although some techniques may take advantage of a combined software/hardware implementation [9]. A widely explored approach to improve cache performance is hardware prefetching that allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches partially miss the potential performance improvement, since they are not tailored to multimedia locality. In this research we are proposing novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. In particular, we have addressed multimedia image processing, where we have included algorithms like the widespread MPEG-2 decoding used for decompression of audio/video streams and typical image processing operations like convolution for image filtering and edge chain coding, used as a pre-processing step in many image analysis tasks. We have omitted evaluation on sound data (like MP3 decompression or speech recognition), since they exhibit typical array spatial locality and standard prefetching techniques perform well enough. Algorithms have been selected according to their spread and their different data addressing schemes: while convolution is dominated by a regular data addressing scheme which can be predicted a priori, edge chain coding is heavily data dependent, in the sense that the address sequence of data references depends on the image and cannot be statically predicted: for example, in this case software prefetching techniques (based on compile-time prediction of future accesses) are not suitable. MPEG-2 exhibits a combination of regular address scheme and data dependency.Typical hardware prefetching techniques are not suitable in this context: techniques based on one-block-lookahead [10] exploit only 1D spatial locality, while adaptive techniques do not match data dependency of some image processing algorithms.