Experiences with optimizing two stream-based applications for cluster execution

Authors:
Yavor Angelov;Umakishore Ramachandran;Kenneth Mackenzie;James Matthew Rehg;Irfan Essa
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2005

Citing 15
Cited 0

Pipelined data parallel algorithms—concept and modeling

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A new model for integrated nested task and data parallel programming

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Available paralellism in video applications

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Digital smart kiosk project

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time memory: a parallel programming abstraction for interactive multimedia applications

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
Garbage collection of timestamped data in Stampede

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Video textures

Proceedings of the 27th annual conference on Computer graphics and interactive techniques
Controlled animation of video sprites

Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation
Stampede: A Programming System for Emerging Scalable Interactive Multimedia Applications

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Integrated Task and Data Parallel Support for Dynamic Applications

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Vision for a smart kiosk

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
D-Stampede: Distributed Programming System for Ubiquitous Computing

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore optimization strategies and resulting performance of two stream-based video applications, video texture and color tracker, on a cluster of SMPs. The two applications are representative of a class of emerging applications, which we call ''stream-based applications'', that are sensitive to both latency of individual results and overall throughput. Such applications require non-trivial parallelization techniques in order to improve both latency and throughput, given that the stream data emanates from a limited set of sources (exactly one in the two applications studied) and that the distribution of the data cannot be done a priori. We suggest techniques that address in a coordinated fashion the problems of data distribution and work partitioning. We believe the two problems are related and need to be addressed together. We have parallelized two applications using the Stampede cluster programming system that provides abstractions for implementing time- and throughput-sensitive applications elegantly and efficiently. For the Video Textures application we show that we can achieve a speedup of 24.26 on a 112 processor cluster. For the Color Tracker application, where latency is more crucial, we identify the extent of data parallelism that ensures that the slowest member of the pipeline is no longer the bottleneck for achieving a decent frame rate.