Exploiting multi-level parallelism for low-latency activity recognition in streaming video

Authors:
Ming-yu Chen;Lily Mummert;Padmanabhan Pillai;Alexander Hauptmann;Rahul Sukthankar
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Intel Labs Pittsburgh, Pittsburgh, PA, USA;Intel Labs Pittsburgh, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Intel Labs Pittsburgh, Pittsburgh, PA, USA
Venue:
MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
Year:
2010

Citing 24
Cited 3

A theory of productivity in the creative process

IEEE Computer Graphics and Applications
The information visualizer, an information workspace

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Scheduling constrained dynamic applications on clusters

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Human Motion Analysis: A Review

NAM '97 Proceedings of the 1997 IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97)
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
Space-Time Behavior-Based Correlation—OR—How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them?

IEEE Transactions on Pattern Analysis and Machine Intelligence
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Response time in man-computer conversational transactions

AFIPS '68 (Fall, part I) Proceedings of the December 9-11, 1968, fall joint computer conference, part I
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
SODA: an optimizing scheduler for large-scale stream-based distributed computer systems

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
SLIPstream: scalable low-latency interactive perception on streaming data

Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
A survey on visual surveillance of object motion and behaviors

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Spatiotemporal salient points for visual recognition of human actions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Stampede: a cluster programming middleware for interactive stream-oriented applications

IEEE Transactions on Parallel and Distributed Systems

Controlling your TV with gestures

Proceedings of the international conference on Multimedia information retrieval
Tracklet descriptors for action modeling and video analysis

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Incremental placement of interactive perception applications

Proceedings of the 20th international symposium on High performance distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications such as surveillance, gaming, videoconferencing, and vision-based user interfaces. Enabling these types of video processing applications will require not only new algorithms and techniques, but new runtime systems that optimize latency as well as throughput. In this paper, we present a runtime system called Sprout that achieves low latency by exploiting the parallelism inherent in video understanding applications. We demonstrate the utility of our system on an activity recognition application that employs a robust new descriptor called MoSIFT, which explicitly augments appearance features with motion information. MoSIFT outperforms previous recognition techniques, but like other state-of-the-art techniques, it is computationally expensive -- a sequential implementation runs 100 times slower than real time. We describe the implementation of the activity recognition application on Sprout, and show that it can accurately recognize activities at full frame rate (25 fps) and low latency on a challenging airport surveillance video corpus.