Machine Learning - Special issue on learning with probabilistic representations
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A mid-level representation framework for semantic sports video analysis
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Automatic replay generation for soccer video broadcasting
Proceedings of the 12th annual ACM international conference on Multimedia
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Early versus late fusion in semantic video analysis
Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis
Proceedings of the 13th annual ACM international conference on Multimedia
Creating audio keywords for event detection in soccer video
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
A novel sports video logo detector based on motion analysis
ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Automatic soccer video analysis and summarization
IEEE Transactions on Image Processing
Hi-index | 0.00 |
In this paper, we propose two novel semantic event detection models, i.e., Two-dependence Bayesian Network (2d-BN) and Conditional Random Fields (CRFs). 2d-BN is a simplified Bayesian Network classifier which can characterize the feature relationships well and be trained more efficiently than traditional complex Bayesian Networks. CRFs are undirected probabilistic graphical models which offer several particular advantages including the abilities to relax strong independence assumptions in the state transition and avoid a fundamental limitation of directed probability graphical models. Based on multi-modality fusion and mid-level keywords representation, we use a three-level framework to detect semantic events. The first level extracts audiovisual features, the mid-level detects semantic keywords, and the high-level infers events using 2d-BN and CRFs models. Compared with state of the art, extensive experimental results demonstrate the effectiveness of the proposed two models.