Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing

Authors:
A. Aydin Alatan;Ali N. Akansu;Wayne Wolf
Affiliations:
Electrical-Electronics Engineering Department, Middle East Technical University, Balgat, Ankara 06531 Turkey. alatan@eee.metu.edu.tr;New Jersey Center for Multimedia Research, New Jersey, Institute of Technology, University Heights, Newark, NJ 07102, USA;Department of Electrical Engineering, Princeton University, Princeton, NJ 08544-5263, USA
Venue:
Multimedia Tools and Applications
Year:
2001

Citing 7
Cited 12

Fundamentals of speech recognition

Fundamentals of speech recognition
Video abstracting

Communications of the ACM
Hidden Markov Model Parsing of Video Programs

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Towards semantically meaningful feature spaces for the characterization of video content

ICIP '97 Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1
Video query: research directions

IBM Journal of Research and Development - Papers on mustimedia systems
Automated generation of news content hierarchy by integrating audio, video, and text information

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
An embedded HMM-based approach for face detection and recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06

Where Does Computational Media Aesthetics Fit?

IEEE MultiMedia
Constructing a bowling information system with video content analysis

MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Multimodal Video Indexing: A Review of the State-of-the-art

Multimedia Tools and Applications
Constructing a Bowling Information System with Video Content Analysis

Multimedia Tools and Applications
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
A fusion scheme of visual and auditory modalities for event detection in sports video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
A neural network approach to audio-assisted movie dialogue detection

Neurocomputing
Indexing of fictional video content for event detection and summarisation

Journal on Image and Video Processing
Audiovisual integration with Segment Models for tennis video parsing

Computer Vision and Image Understanding
Scene detection using visual and audio attention

Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television
Hierarchical audio content classification system using an optimal feature selection algorithm

Multimedia Tools and Applications
A framework for dialogue detection in movies

MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.