Hierarchical mixtures of experts and the EM algorithm
Neural Computation
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Recognition of Group Activities using Dynamic Probabilistic Networks
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Optimising area under the ROC curve using gradient descent
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
ACM Transactions on Information Systems (TOIS)
An ensemble classifier learning approach to ROC optimization
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
A principled approach to score level fusion in multimodal biometric systems
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Fast Action Detection via Discriminative Random Forest Voting and Top-K Subvolume Search
IEEE Transactions on Multimedia
Multimedia event detection with multimodal feature fusion and temporal concept localization
Machine Vision and Applications
Hi-index | 0.00 |
We address the problem of complicated event categorization from a large dataset of videos "in the wild", where multiple classifiers are applied independently to evaluate each video with a 'likelihood' score. The core contribution of this paper is a local expert forest model for meta-level score fusion for event detection under heavily imbalanced class distributions. Our motivation is to adapt to performance variations of the classifiers in different regions of the score space, using a divide-and-conquer technique. We propose a novel method to partition the likelihood-space, being sensitive to local label distributions in imbalanced data, and train a pair of locally optimized experts each time. Multiple pairs of experts based on different partitions ('trees') form a 'forest', balancing local adaptivity and over-fitting of the model. As a result, our model disregards classifiers in regions of the score space where their performance is bad, achieving both local source selection and fusion. We experiment with the TRECVID Multimedia Event Detection (MED) dataset, detecting 15 complicated events from around 34k video clips comprising more than 1000 hours, and demonstrate superior performance compared to other score-level fusion methods.