A neural network approach to audio-assisted movie dialogue detection

Authors:
Margarita Kotti;Emmanouil Benetos;Constantine Kotropoulos;Ioannis Pitas
Affiliations:
Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece;Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece;Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece;Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece
Venue:
Neurocomputing
Year:
2007

Citing 11
Cited 0

Statistical analysis of extreme values

Statistical analysis of extreme values
Statistical methods for speech recognition

Statistical methods for speech recognition
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing

Multimedia Tools and Applications
The particle swarm optimization algorithm: convergence analysis and parameter selection

Information Processing Letters
Audio-visual synchrony for detection of monologues in video archives

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Dialogue sequence detection in movies

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
A framework for dialogue detection in movies

MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security

Quantified Score

Hi-index	0.01

Visualization

Abstract

A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons (MLPs), voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based MLPs are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on six different movies. A total of 41 dialogue instances and another 20 non-dialogue instances are employed. The average detection accuracy achieved is high, ranging between 84.78%+/-5.499% and 91.43%+/-4.239%.