Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
International Journal of Computer Vision
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Towards optimal bag-of-features for object categorization and semantic video retrieval
Proceedings of the 6th ACM international conference on Image and video retrieval
Kodak's consumer video benchmark data set: concept definition and annotation
Proceedings of the international workshop on Workshop on multimedia information retrieval
LabelMe: A Database and Web-Based Tool for Image Annotation
International Journal of Computer Vision
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
Audio-visual atoms for generic video concept classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Audio-visual grouplet: temporal audio-visual interactions for general video concept classification
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Laplacian adaptive context-based SVM for video concept detection
WSM '11 Proceedings of the 3rd ACM SIGMM international workshop on Social media
SUPER: towards real-time event recognition in internet videos
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Joint audio-visual bi-modal codewords for video event detection
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Multimodal semantics extraction from user-generated videos
Advances in Multimedia
Submodular video hashing: a unified framework towards video pooling and indexing
Proceedings of the 20th ACM international conference on Multimedia
A fast video event recognition system and its application to video search
Proceedings of the 20th ACM international conference on Multimedia
Attribute learning for understanding unstructured social activity
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Explicit performance metric optimization for fusion-based video retrieval
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Consumer video dataset with marked head trajectories
Proceedings of the 4th ACM Multimedia Systems Conference
Blip10000: a social video dataset containing SPUG content for tagging and retrieval
Proceedings of the 4th ACM Multimedia Systems Conference
Segmental multi-way local pooling for video recognition
Proceedings of the 21st ACM international conference on Multimedia
Fast image/video collection summarization with local clustering
Proceedings of the 21st ACM international conference on Multimedia
Large-scale visual sentiment ontology and detectors using adjective noun pairs
Proceedings of the 21st ACM international conference on Multimedia
Human interaction categorization by using audio-visual cues
Machine Vision and Applications
Machine Vision and Applications
Discovering joint audio---visual codewords for video event detection
Machine Vision and Applications
Journal of Signal Processing Systems
Hi-index | 0.00 |
Recognizing visual content in unconstrained videos has become a very important problem for many applications. Existing corpora for video analysis lack scale and/or content diversity, and thus limited the needed progress in this critical area. In this paper, we describe and release a new database called CCV, containing 9,317 web videos over 20 semantic categories, including events like "baseball" and "parade", scenes like "beach", and objects like "cat". The database was collected with extra care to ensure relevance to consumer interest and originality of video content without post-editing. Such videos typically have very little textual annotation and thus can benefit from the development of automatic content analysis techniques. We used Amazon MTurk platform to perform manual annotation, and studied the behaviors and performance of human annotators on MTurk. We also compared the abilities in understanding consumer video content by humans and machines. For the latter, we implemented automatic classifiers using state-of-the-art multi-modal approach that achieved top performance in recent TRECVID multimedia event detection task. Results confirmed classifiers fusing audio and video features significantly outperform single-modality solutions. We also found that humans are much better at understanding categories of nonrigid objects such as "cat", while current automatic techniques are relatively close to humans in recognizing categories that have distinctive background scenes or audio patterns.