Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing

  • Authors:
  • A. Aydin Alatan;Ali N. Akansu;Wayne Wolf

  • Affiliations:
  • Electrical-Electronics Engineering Department, Middle East Technical University, Balgat, Ankara 06531 Turkey. alatan@eee.metu.edu.tr;New Jersey Center for Multimedia Research, New Jersey, Institute of Technology, University Heights, Newark, NJ 07102, USA;Department of Electrical Engineering, Princeton University, Princeton, NJ 08544-5263, USA

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.