Fundamentals of speech recognition
Fundamentals of speech recognition
Communications of the ACM
Hidden Markov Model Parsing of Video Programs
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Towards semantically meaningful feature spaces for the characterization of video content
ICIP '97 Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1
Video query: research directions
IBM Journal of Research and Development - Papers on mustimedia systems
Automated generation of news content hierarchy by integrating audio, video, and text information
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
An embedded HMM-based approach for face detection and recognition
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Where Does Computational Media Aesthetics Fit?
IEEE MultiMedia
Constructing a bowling information system with video content analysis
MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Multimodal Video Indexing: A Review of the State-of-the-art
Multimedia Tools and Applications
Constructing a Bowling Information System with Video Content Analysis
Multimedia Tools and Applications
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing
IEEE Transactions on Pattern Analysis and Machine Intelligence
A fusion scheme of visual and auditory modalities for event detection in sports video
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Indexing of fictional video content for event detection and summarisation
Journal on Image and Video Processing
Audiovisual integration with Segment Models for tennis video parsing
Computer Vision and Image Understanding
Scene detection using visual and audio attention
Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television
Hierarchical audio content classification system using an optimal feature selection algorithm
Multimedia Tools and Applications
A framework for dialogue detection in movies
MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security
Hi-index | 0.00 |
A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.