Automatic retrieval of visual continuity errors in movies
Proceedings of the ACM International Conference on Image and Video Retrieval
A Novel Role-Based Movie Scene Segmentation Method
PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Character identification in feature-length films using global face-name matching
IEEE Transactions on Multimedia
A survey on vision-based human action recognition
Image and Vision Computing
A generic framework for event detection in various video domains
Proceedings of the international conference on Multimedia
Modeling the temporal extent of actions
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Learning relations among movie characters: a social network perspective
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
The Journal of Machine Learning Research
Social network analysis in a movie using character-net
Multimedia Tools and Applications
Synergistic methods for using language in robotics
Proceedings of the Workshop on Performance Metrics for Intelligent Systems
Human focused action localization in video
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Emotion-based character clustering for managing story-based contents: a cinemetric analysis
Multimedia Tools and Applications
Unsupervised language learning for discovered visual concepts
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part IV
Hi-index | 0.00 |
Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales "in the wild". Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highly-varied datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more accurate tracking of people, actions and objects. Scene segmentation, alignment, and shot threading are formulated as inference in a unified generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear time is presented. We present quantitative and qualitative results on movie alignment and parsing, and use the recovered structure to improve character naming and retrieval of common actions in several episodes of popular TV series.