Entropy metrics used for video summarization
SCCG '02 Proceedings of the 18th spring conference on Computer graphics
Adaptive speaker identification with audiovisual cues for movie content analysis
Pattern Recognition Letters - Video computing
Efficient retrieval of life log based on context and content
Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
Multimodal Video Indexing: A Review of the State-of-the-art
Multimedia Tools and Applications
Early versus late fusion in semantic video analysis
Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis
Proceedings of the 13th annual ACM international conference on Multimedia
Mobile face detection and tracking for media streaming applications
International Journal of Wireless and Mobile Computing
Video Shot Boundary Detection Using Generalized Eigenvalue Decomposition
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
Parsing news video using integrated audio-video features
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Novel concept for video retrieval in life log application
PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II
Hi-index | 0.00 |
A content-based video parsing and indexing method is presented in this paper, which analyzes both information sources (auditory and visual) and accounts for their inter-relations and synergy to extract high-level semantic information. Both frame- and object-based access to the visual information is employed. The aim of the method is to extract semantically meaningful video scenes and assign semantic label(s) to them. Due to the temporal nature of video, time has to be accounted for. Thus, time-constrained video representations and indices are generated. The current approach searches for specific types of content information relevant to the presence or absence of speakers or persons. Audio-source parsing and indexing leads to the extraction of a speaker label mapping of the source over time. Video-source parsing and indexing results in the extraction of a talking-face shot mapping over time. Integration of the audio and visual mappings constrained by interaction rules leads to higher levels of video abstraction and even partial detection of its context