The Journal of Machine Learning Research
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Large-Scale Concept Ontology for Multimedia
IEEE MultiMedia
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Towards optimal bag-of-features for object categorization and semantic video retrieval
Proceedings of the 6th ACM international conference on Image and video retrieval
Multi-modality web video categorization
Proceedings of the international workshop on Workshop on multimedia information retrieval
Web video topic discovery and tracking via bipartite graph reinforcement model
Proceedings of the 17th international conference on World Wide Web
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision
NUS-WIDE: a real-world web image database from National University of Singapore
Proceedings of the ACM International Conference on Image and Video Retrieval
Hi-index | 0.00 |
The overwhelming amount of multimedia entities shared over the web has given rise to the need for semantic identification and classification of these entities. Numerous research efforts have tackled this problem by developing advanced content analysis techniques as well as leveraging on readily available tags, scripts, and blogs related to these multimedia entities. However, in many cases, especially for event detection and action recognition, the research efforts were hampered by the lack of large scale publicly available benchmarks. To address this problem, this paper presents a large-scale movie corpus named MovieBase that covers full length feature movies as well as huge volume of movie-related video clips downloaded from YouTube. The corpus is designed for research in event detection and action recognition. It offers over 71 hours of videos with a total of 69,129 shots. The corpus has been hand-labeled according to 7 audio and 11 visual concept tags to semantically define 11 event categories under the romantic and violence scenes. The corpus comes with a set of pre-extracted low-level visual, motion, audio as well as high-level features. Related results are furnished as a baseline for the movie event detection task.