MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Evaluating MMX technology using DSP and multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Performance Characterization of the Pentium® Pro Processor
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Journal of Parallel and Distributed Computing
Instant Multiunit Resource Hardware Deadlock Detection Scheme for System-on-Chips
ACM Transactions on Embedded Computing Systems (TECS)
Exploiting SIMD parallelism on dynamically partitioned parallel network coding for P2P systems
Computers and Electrical Engineering
Hi-index | 0.00 |
In this paper, we present a case study of the execution time characteristics of several popular commercial audio and video applications on a state of the art microprocessor, the Intel Pentium 4. The on-chip performance counters on the Pentium 4 processor are used to perform this study using actual real-world workloads. While the Pentium 4 is capable of executing 3-4 instructions in one cycle, it was observed that commercial audio and video applications take between 1.4 and 3.5 cycles (per instruction) to execute. Despite using large caches and sophisticated out of ordering techniques, the average cycles per instruction is higher than a predecessor like Pentium II. This indicates that while clock frequency has improved, real speedups are not scaling. The performance of multimedia programs is compared with execution characteristics of SPEC CPU 2000 programs. Performance impact of branch predictors, caches and trace caches on the Pentium 4 are analyzed for multimedia and SPEC CPU applications.