Fundamentals of speech recognition
Fundamentals of speech recognition
An introduction to signal detection and estimation (2nd ed.)
An introduction to signal detection and estimation (2nd ed.)
MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Motion-Based Video Representation for Scene Change Detection
International Journal of Computer Vision
Video Scene Segmentation via Continuous Video Coherence
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Audio as a Support to Scene Change Detection and Characterization of Video Sequences
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Color-spatial image indexing and applications
Color-spatial image indexing and applications
Motion analysis and segmentation through spatio-temporal slices processing
IEEE Transactions on Image Processing
Video partitioning by temporal slice coherency
IEEE Transactions on Circuits and Systems for Video Technology
Optimal multimodal fusion for multimedia data analysis
Proceedings of the 12th annual ACM international conference on Multimedia
Semantics reinforcement and fusion learning for multimedia streams
Proceedings of the 6th ACM international conference on Image and video retrieval
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
The state of the art in image and video retrieval
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Local-feature-based image retrieval with weighted relevance feedback
International Journal of Business Intelligence and Data Mining
Hi-index | 0.00 |
The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.