A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs

Authors:
Pascal Kelm;Sebastian Schmiedeke;Thomas Sikora
Affiliations:
TU Berlin (Communication Systems Group), Berlin, Germany;TU Berlin (Communication Systems Group), Berlin, Germany;TU Berlin (Communication Systems Group), Berlin, Germany
Venue:
SBNMA '11 Proceedings of the 2011 ACM workshop on Social and behavioural networked media access
Year:
2011

Citing 7
Cited 4

Introduction to MPEG-7: Multimedia Content Description Interface

Introduction to MPEG-7: Multimedia Content Description Interface
Introduction to Information Retrieval

Introduction to Information Retrieval
Lire: lucene image retrieval: an extensible java CBIR library

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Mapping the world's photos

Proceedings of the 18th international conference on World wide web
An agenda for the next generation gazetteer: geographic information contribution and retrieval

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Multi-source toponym data integration and mediation for a meta-gazetteer service

GIScience'10 Proceedings of the 6th international conference on Geographic information science
Automatic tagging and geotagging in video collections and communities

Proceedings of the 1st ACM International Conference on Multimedia Retrieval

Cross-modal categorisation of user-generated video sequences

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Multimedia multimodal geocoding

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Multimodal geo-tagging in social media websites using hierarchical spatial segmentation

Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
Human vs machine: establishing a human baseline for multimodal location estimation

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a hierarchical, multi-modal approach for placing Flickr videos on the map. Our approach makes use of external resources to identify toponyms in the metadata and of visual and textual features to identify similar content. First, the geographical boundaries extraction method identifies the country and its dimension. We use a database of more than 3.6 million Flickr images to group them together into geographical regions and to build a hierarchical model. A fusion of visual and textual methods is used to classify the videos' location into possible regions. Next, the visually nearest neighbour method uses a nearest neighbour approach to find correspondences with the training images within the preclassified regions. The video sequences are represented using low-level feature vectors from multiple key frames. The Flickr videos are tagged with the geo-information of the visually most similar training item within the regions that is previously filtered by the pre-classification step for each test video. The results show that we are able to tag one third of our videos correctly within an error of 1 km.