Computable scenes and structures in films

Authors:
H. Sundaram;Shih-Fu Chang
Affiliations:
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA;-
Venue:
IEEE Transactions on Multimedia
Year:
2002

Citing 0
Cited 18

A utility framework for the automatic generation of audio-visual skims

Proceedings of the tenth ACM international conference on Multimedia
Semantic video classification and feature subset selection under context and concept uncertainty

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Narrative abstraction model for story-oriented video

Proceedings of the 12th annual ACM international conference on Multimedia
Topic transition detection using hierarchical hidden Markov and semi-Markov models

Proceedings of the 13th annual ACM international conference on Multimedia
Learning rich semantics from news video archives by style analysis

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A narrative-based abstraction framework for story-oriented video

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Scene detection in videos using shot clustering and sequence alignment

IEEE Transactions on Multimedia
Content-based image and video indexing and retrieval

Proceedings of the 2005 joint Chinese-German conference on Cognitive systems
Affective content-based film clips retrieval algorithm using improved fuzzy comprehensive evaluation

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Automatic video abstraction via the progress of story

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Video scene detection using graph-based representations

Image Communication
Dominant sets based movie scene detection

Signal Processing
A method of generating table of contents for educational videos

PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part II
Video scene segmentation using sequential change detection

PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part III
Improvement of commercial boundary detection using audiovisual features

PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
Interactive multimedia system for distance learning of higher education

Edutainment'06 Proceedings of the First international conference on Technologies for E-Learning and Digital Entertainment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a computational scene model and also derive novel algorithms for computing audio and visual scenes and within-scene structures in films. We use constraints derived from film-making rules and from experimental results in the psychology of audition, in our computational scene model. Central to the computational model is the notion of a causal, finite-memory viewer model. We segment the audio and video data separately. In each case, we determine the degree of correlation of the most recent data in the memory with the past. The audio and video scene boundaries are determined using local maxima and minima, respectively. We derive four types of computable scenes that arise due to different kinds of audio and video scene boundary synchronizations. We show how to exploit the local topology of an image sequence in conjunction with statistical tests, to determine dialogs. We also derive a simple algorithm to detect silences in audio. An important feature of our work is to introduce semantic constraints based on structure and silence in our computational model. This results in computable scenes that are more consistent with human observations. The algorithms were tested on a difficult data set: three commercial films. We take the first hour of data from each of the three films. The best results: computational scene detection: 94%; dialogue detection: 91%; and recall 100% precision.