CASEE: a hierarchical event representation for the analysis of videos

Authors:
Asaad Hakeem;Yaser Sheikh;Mubarak Shah
Affiliations:
University of Central Florida, Orlando, FL;University of Central Florida, Orlando, FL;University of Central Florida, Orlando, FL
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 8
Cited 10

Matching Hierarchical Structures Using Association Graphs

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition of Visual Activities and Interactions by Stochastic Parsing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions

International Journal of Computer Vision
Visual Event Classification via Force Dynamics

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Human Action Detection Using PNF Propagation of Temporal Constraints

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Action Recognition Using Probabilistic Parsing

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Understanding manipulation in video

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Towards ontology based cognitive vision

ICVS'03 Proceedings of the 3rd international conference on Computer vision systems

Learning, detection and representation of multi-agent events in videos

Artificial Intelligence
Mining relationships among interval-based events for classification

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Multiple agent event detection and representation in videos

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Building Petri nets from video event ontologies

ISVC'07 Proceedings of the 3rd international conference on Advances in visual computing - Volume Part I
Human activity analysis: A review

ACM Computing Surveys (CSUR)
Stochastic Representation and Recognition of High-Level Group Activities

International Journal of Computer Vision
A comprehensive study of visual event computing

Multimedia Tools and Applications
Complex activity representation and recognition by extended stochastic grammar

ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part I
On the need to bootstrap ontology learning with extraction grammar learning

ICCS'05 Proceedings of the 13th international conference on Conceptual Structures: common Semantics for Sharing Knowledge
Learning and parsing video events with goal and intent prediction

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

A representational gap exists between low-level measurements (segmentation, object classification, tracking) and high-level understanding of video sequences. In this paper, we propose a novel representation of events in videos to bridge this gap, based on the CASE representation of natural languages. The proposed representation has three significant contributions over existing frameworks. First, we recognize the importance of causal and temporal relationships between subevents and extend CASE to allow the representation of temporal structure and causality between sub-events. Second, in order to capture both multi-agent and multithreaded events, we introduce a hierarchical CASE representation of events in terms of sub-events and case-lists. Last, for purposes of implementation we present the concept of a temporal event-tree, and pose the problem of event detection as subtree pattern matching. By extending CASE, a natural language representation, for the representation of events, the proposed work allows a plausible means of interface between users and the computer. We show two important applications of the proposed event representation for the automated annotation of standard meeting video sequences, and for event detection in extended videos of railroad crossings.