Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
IEEE Transactions on Pattern Analysis and Machine Intelligence
LabelMe: A Database and Web-Based Tool for Image Annotation
International Journal of Computer Vision
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Who are the crowdworkers?: shifting demographics in mechanical turk
CHI '10 Extended Abstracts on Human Factors in Computing Systems
Crowdsourced data collection of facial responses
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Crowdsourcing micro-level multimedia annotations: the challenges of evaluation and interface
Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Sentiment analysis using a novel human computation game
Proceedings of the 3rd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP
Active frame selection for label propagation in videos
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Translating related words to videos and back through latent topics
Proceedings of the sixth ACM international conference on Web search and data mining
Efficiently Scaling up Crowdsourced Video Annotation
International Journal of Computer Vision
An evaluation of labelling-game data for video retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A crowdsourcing approach to support video annotation
Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications
Proceedings of the 15th ACM on International conference on multimodal interaction
Proceedings of the 19th international conference on Intelligent User Interfaces
Hi-index | 0.00 |
Accurately annotating entities in video is labor intensive and expensive. As the quantity of online video grows, traditional solutions to this task are unable to scale to meet the needs of researchers with limited budgets. Current practice provides a temporary solution by paying dedicated workers to label a fraction of the total frames and otherwise settling for linear interpolation. As budgets and scale require sparser key frames, the assumption of linearity fails and labels become inaccurate. To address this problem we have created a public framework for dividing the work of labeling video data into micro-tasks that can be completed by huge labor pools available through crowdsourced marketplaces. By extracting pixel-based features from manually labeled entities, we are able to leverage more sophisticated interpolation between key frames to maximize performance given a budget. Finally, by validating the power of our framework on difficult, real-world data sets we demonstrate an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling.