Scene Change Detection Based on Audio-Visual Analysis and Interaction
Proceedings of the 10th International Workshop on Theoretical Foundations of Computer Vision: Multi-Image Analysis
Clustering of Imperfect Transcripts Using a Novel Similarity Measure
Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
Multimodal analysis of recorded video for e-learning
Proceedings of the 13th annual ACM international conference on Multimedia
Hi-index | 0.00 |
An audio-visual content analysis method is presented, which analyzes both auditory and visual information sources and accounts for their inter-relations and coincidence to extract high-level semantic information. Both shot-based and object-based access to the visual information is employed. Due to the temporal nature of video, time has to be accounted for. Thus, time-constrained video labelling functions are generated. Audio source parsing leads to the extraction of a speaker identity mapping function over time. Visual source parsing results in the extraction of a talking face shot mapping function over time. Integration of the audio and visual mappings constrained by interaction rules leads to more detailed video content descriptions and even partial detection of its context.