Image Based Localization in Urban Environments
3DPVT '06 Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06)
LabelMe: A Database and Web-Based Tool for Image Annotation
International Journal of Computer Vision
Crowdsourcing user studies with Mechanical Turk
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval
WIAMIS '08 Proceedings of the 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services
Methods for extracting place semantics from Flickr tags
ACM Transactions on the Web (TWEB)
Graphical Models, Exponential Families, and Variational Inference
Foundations and Trends® in Machine Learning
Proceedings of the 18th international conference on World wide web
Placing flickr photos on a map
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
MM '09 Proceedings of the 17th ACM international conference on Multimedia
CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval
ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Analyzing the Amazon Mechanical Turk marketplace
XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
Multimodal location estimation
Proceedings of the international conference on Multimedia
Geotagging in multimedia and computer vision--a survey
Multimedia Tools and Applications
Automatic tagging and geotagging in video collections and communities
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Multimodal location estimation on Flickr videos
WSM '11 Proceedings of the 3rd ACM SIGMM international workshop on Social media
SBNMA '11 Proceedings of the 2011 ACM workshop on Social and behavioural networked media access
Probabilistic linear discriminant analysis
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Multimodal Location Estimation of Consumer Media: Dealing with Sparse Training Data
ICME '12 Proceedings of the 2012 IEEE International Conference on Multimedia and Expo
Pushing the limits of mechanical turk: qualifying the crowd for video geo-location
Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Methods for extracting place semantics from Flickr tags
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Over the recent years, the problem of video location estimation (i.e., estimating the longitude/latitude coordinates of a video without GPS information) has been approached with diverse methods and ideas in the research community and significant improvements have been made. So far, however, systems have only been compared against each other and no systematic study on human performance has been conducted. Based on a human-subject study with 11,900 experiments, this article presents a human baseline for location estimation for different combinations of modalities (audio, audio/video, audio/video/text). Furthermore, this article compares state-of-the-art location estimation systems with the human baseline. Although the overall performance of humans' multimodal video location estimation is better than current machine learning approaches, the difference is quite small: For 41% of the test set, the machine's accuracy was superior to the humans. We present case studies and discuss why machines did better for some videos and not for others. Our analysis suggests new directions and priorities for future work on the improvement of location inference algorithms.