Semantic video annotation by mining association patterns from visual and speech features

Authors:
Vincent S. Tseng;Ja-Hwung Su;Jhih-Hong Huang;Chih-Jen Chen
Affiliations:
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.
Venue:
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2008

Citing 5
Cited 2

Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective

IEEE Transactions on Knowledge and Data Engineering
Sequential association mining for video summarization

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Effective Video Annotation by Mining Visual Features and Speech Features

IIH-MSP '07 Proceedings of the Third International Conference on International Information Hiding and Multimedia Signal Processing (IIH-MSP 2007) - Volume 01
Mining video associations for efficient database management

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Characteristic pattern discovery in videos

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Probabilistic spatio-temporal inference for motion event understanding

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel approach for semantic video annotation through integrating visual features and speech features. By employing statistics and association patterns, the relations between video shots and human concept can be discovered effectively to conceptualize videos. In other words, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with broad and complex keyword identification in video annotation. Empirical evaluations on NIST TRECVID video datasets reveal that our proposed approach can enhance the annotation accuracy substantially.