Multimodal semantics extraction from user-generated videos

Authors:
Francesco Cricri;Kostadin Dabov;Mikko J. Roininen;Sujeet Mate;Igor D. D. Curcio;Moncef Gabbouj
Affiliations:
Department of Signal Processing, Tampere University of Technology, Tampere, Finland;Department of Signal Processing, Tampere University of Technology, Tampere, Finland;Department of Signal Processing, Tampere University of Technology, Tampere, Finland;Nokia Research Center, Tampere, Finland;Nokia Research Center, Tampere, Finland;Department of Signal Processing, Tampere University of Technology, Tampere, Finland
Venue:
Advances in Multimedia
Year:
2012

Citing 16
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
Indoor-Outdoor Image Classification

CAIVD '98 Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases (CAIVD '98)
A Computationally Efficient Approach to Indoor/Outdoor Scene Classification

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Inference of Non-Overlapping Camera Network Topology by Measuring Statistical Dependence

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Multimodal Genre Analysis Applied to Digital Television Archives

DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Analyzing the video popularity characteristics of large-scale user generated content systems

IEEE/ACM Transactions on Networking (TON)
Indoor vs. outdoor scene classification in digital photographs

Pattern Recognition
Personalized production of basketball videos from multi-sensored data under limited display resolution

Computer Vision and Image Understanding
Crowdsourced automatic zoom and scroll for video retargeting

Proceedings of the international conference on Multimedia
Consumer video understanding: a benchmark database and an evaluation of human and machine performance

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
The evaluation of the error characteristics of multiple GPS terminals

CSCS '11 Proceedings of the 2nd international conference on Circuits, systems, control, signals
Multimodal Event Detection in User Generated Videos

ISM '11 Proceedings of the 2011 IEEE International Symposium on Multimedia
Automatic Video Classification: A Survey of the Literature

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

The jiku mobile video dataset

Proceedings of the 4th ACM Multimedia Systems Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

User-generated video content has grown tremendously fast to the point of outpacing professional content creation. In this work we develop methods that analyze contextual information of multiple user-generated videos in order to obtain semantic information about public happenings (e.g., sport and live music events) being recorded in these videos. One of the key contributions of this work is a joint utilization of different data modalities, including such captured by auxiliary sensors during the video recording performed by each user. In particular, we analyze GPS data, magnetometer data, accelerometer data, video- and audio-content data. We use these data modalities to infer information about the event being recorded, in terms of layout (e.g., stadium), genre, indoor versus outdoor scene, and the main area of interest of the event. Furthermore we propose a method that automatically identifies the optimal set of cameras to be used in a multicamera video production. Finally, we detect the camera users which fall within the field of view of other cameras recording at the same public happening. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real sport events and live music performances.