Cross-modal categorisation of user-generated video sequences

Authors:
Sebastian Schmiedeke;Pascal Kelm;Thomas Sikora
Affiliations:
Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 12
Cited 1

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Multi-modality web video categorization

Proceedings of the international workshop on Workshop on multimedia information retrieval
Speeded-Up Robust Features (SURF)

Computer Vision and Image Understanding
Speech Processing for Audio Indexing

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
TubeFiler: an automatic web video categorizer

MM '09 Proceedings of the 17th ACM international conference on Multimedia
CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval

ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-based video genre classification using multiple cues

Proceedings of the 3rd international workshop on Automated information extraction in media production
Automatic tagging and geotagging in video collections and communities

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs

SBNMA '11 Proceedings of the 2011 ACM workshop on Social and behavioural networked media access
Content-based video description for automatic video genre categorization

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Automatic Video Classification: A Survey of the Literature

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Intent and its discontents: the user at the wheel of the online video search engine

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the possibilities of cross-modal classification of multimedia documents in social media platforms. Our framework predicts the user-chosen category of consumer-produced video sequences based on their textual and visual features. These text resources---includes metadata and automatic speech recognition transcripts---are represented as bags of words and the video content is represented as a bag of clustered local visual features. The contribution of the different modalities is investigated and how they should be combined if sequences lack certain resources. Therefore, several classification methods are evaluated, varying the resources. The paper shows an approach that achieves a mean average precision of 0.3977 using user-contributed metadata in combination with clustered SURF.