Combining content-based analysis and crowdsourcing to improve user interaction with zoomable video

Authors:
Axel Carlier;Guntur Ravindra;Vincent Charvillat;Wei Tsang Ooi
Affiliations:
University of Toulouse, Toulouse, France;National University of Singapore, Singapore, Singapore;University of Toulouse, Toulouse, France;National University of Singapore, Singapore, Singapore
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 20
Cited 3

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning video browsing behavior and its application in the generation of video previews

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning user interest for image browsing on small-form-factor devices

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Region extraction of a gaze object using the gaze point and view image sequences

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Video retargeting: automating pan and scan

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Watch what I watch: using community activity to understand content

Proceedings of the international workshop on Workshop on multimedia information retrieval
Multi-scale video cropping

Proceedings of the 15th international conference on Multimedia
Video browsing by direct manipulation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Authoring, viewing, and generating hypervideo: An overview of Hyper-Hitchcock

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Video object annotation, navigation, and composition

Proceedings of the 21st annual ACM symposium on User interface software and technology
SmartPlayer: user-centric video fast-forwarding

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Multi-operator media retargeting

ACM SIGGRAPH 2009 papers
Motion-aware temporal coherence for video resizing

ACM SIGGRAPH Asia 2009 papers
Human detection using a mobile platform and novel features derived from a visual saliency mechanism

Image and Vision Computing
Supporting zoomable video streams with dynamic region-of-interest cropping

MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
Crowdsourced automatic zoom and scroll for video retargeting

Proceedings of the international conference on Multimedia
Towards characterizing users' interaction with zoomable video

Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Unsupervised extraction of visual attention objects in color images

IEEE Transactions on Circuits and Systems for Video Technology

On tile assignment for region-of-interest video streaming in a wireless LAN

Proceedings of the 22nd international workshop on Network and Operating System Support for Digital Audio and Video
Enhancing online 3D products through crowdsourcing

Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Scenario-driven interactive panorama video delivery: promptly watch and share enjoyable parts of an event

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new paradigm for interacting with zoomable video. Our interaction technique reduces the number of zooms and pans required by providing recommended viewports to the users, and replaces multiple zoom and pan actions with a simple click on the recommended viewport. The usefulness of our technique is visible in the quality of the recommended viewport, which needs to match the user intention, track movement in the scene, and properly frame the scene in the video. To this end, we propose a hybrid method where content analysis is complimented by the implicit feedback of a community of users in order to recommend viewports. We first compute preliminary sets of recommended viewports by analyzing the content of the video. These viewports allow tracking of moving objects in the scene, and are framed without violating basic aesthetic rules. To improve the relevance of the recommended viewports, we collect viewing statistics as users view a video, and use the viewports they select to reinforce the importance of certain recommendations and penalize others. New recommendations that are not previously recognized by content analysis may also emerge. The resulting recommended viewports converge towards the regions in the video that are relevant to users. A user study involving 70 participants shows that an user interface incorporating with our paradigm leads to more number of zooms, into more informative regions with fewer interactions.