Detection of documentary scene changes by audio-visual fusion

Authors:
Atulya Velivelli;Chong-Wah Ngo;Thomas S. Huang
Affiliations:
Beckman Institute for Advanced Science and Technology, Urbana;City University of Hong Kong, Hong Kong;Beckman Institute for Advanced Science and Technology, Urbana
Venue:
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Year:
2003

Citing 9
Cited 5

Fundamentals of speech recognition

Fundamentals of speech recognition
An introduction to signal detection and estimation (2nd ed.)

An introduction to signal detection and estimation (2nd ed.)
Digital video segmentation

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Motion-Based Video Representation for Scene Change Detection

International Journal of Computer Vision
Video Scene Segmentation via Continuous Video Coherence

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Audio as a Support to Scene Change Detection and Characterization of Video Sequences

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Color-spatial image indexing and applications

Color-spatial image indexing and applications
Motion analysis and segmentation through spatio-temporal slices processing

IEEE Transactions on Image Processing
Video partitioning by temporal slice coherency

IEEE Transactions on Circuits and Systems for Video Technology

Optimal multimodal fusion for multimedia data analysis

Proceedings of the 12th annual ACM international conference on Multimedia
Semantics reinforcement and fusion learning for multimedia streams

Proceedings of the 6th ACM international conference on Image and video retrieval
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
The state of the art in image and video retrieval

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Local-feature-based image retrieval with weighted relevance feedback

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.