Integrated Mining of Visual Features, Speech Features, and Frequent Patterns for Semantic Video Annotation

Authors:
V. S. Tseng;Ja-Hwung Su;Jhih-Hong Huang;Chih-Jen Chen
Affiliations:
Nat. Cheng Kung Univ., Tainan;-;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2008

Citing 0
Cited 11

Ontology-Based Semantic Web Image Retrieval by Utilizing Textual and Visual Annotations

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Personalized rough-set-based recommendation by integrating multiple contents and collaborative information

Information Sciences: an International Journal
Human action annotation, modeling and analysis based on implicit user interaction

Multimedia Tools and Applications
Knowledge-discounted event detection in sports video

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans - Special issue on model-based diagnostics
Implicit visual concept modeling in image / video annotation

Proceedings of the first ACM international workshop on Analysis and retrieval of tracked events and motion in imagery streams
Mining anomalous events against frequent sequences in surveillance videos from commercial environments

Expert Systems with Applications: An International Journal
An automatic web-oriented multimedia extraction and multiresolution visualization scheme

ACA'12 Proceedings of the 11th international conference on Applications of Electrical and Computer Engineering
Classifier-specific intermediate representation for multimedia tasks

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Cross community news event summary generation based on collaborative ranking

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Knowledge adaptation for ad hoc multimedia event detection with few exemplars

Proceedings of the 20th ACM international conference on Multimedia
Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.