Event recognition: viewing the world with a third eye

Authors:
Jiebo Luo;Jie Yu;Dhiraj Joshi;Wei Hao
Affiliations:
Eastman Kodak Company, Rochester, NY, USA;Eastman Kodak Company, Rochester, NY, USA;Eastman Kodak Company, Rochester, NY, USA;Eastman Kodak Company, Rochester, NY, USA
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 30
Cited 17

Matching words and pictures

The Journal of Machine Learning Research
On image auto-annotation with latent space models

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Location Based Services

Location Based Services
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Web-a-where: geotagging web content

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Image Retrieval from the World Wide Web: Issues, Techniques, and Systems

ACM Computing Surveys (CSUR)
On the detection of semantic concepts at TRECVID

Proceedings of the 12th annual ACM international conference on Multimedia
LORE: an infrastructure to support location-aware services

IBM Journal of Research and Development
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Visualizing tags over time

Proceedings of the 15th international conference on World Wide Web
Extracting Semantic Location from Outdoor Positioning Systems

MDM '06 Proceedings of the 7th International Conference on Mobile Data Management
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Generalized Multiclass AdaBoost and Its Applications to Multimedia Classification

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Generating summaries and visualization for large collections of geo-referenced photographs

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Why we tag: motivations for annotation in mobile and online media

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semantics reinforcement and fusion learning for multimedia streams

Proceedings of the 6th ACM international conference on Image and video retrieval
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Large-scale multimodal semantic concept detection for consumer video

Proceedings of the international workshop on Workshop on multimedia information retrieval
Correlative multi-label video annotation

Proceedings of the 15th international conference on Multimedia
Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
Tagging over time: real-world image annotation by lightweight meta-learning

Proceedings of the 15th international conference on Multimedia
How flickr helps us make sense of the world: context and content in community-contributed media collections

Proceedings of the 15th international conference on Multimedia
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Inferring generic activities and events from image content and bags of geo-tags

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Leveraging probabilistic season and location context models for scene understanding

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Real-Time Computerized Annotation of Pictures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Hierarchical Classification of Vacation Images

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

Enhancing semantic and geographic annotation of web images via logistic canonical correlation regression

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Mobile media search: has media search finally found its perfect platform? part II

MM '09 Proceedings of the 17th ACM international conference on Multimedia
A visual analysis of the relationship between word concepts and geographical locations

Proceedings of the ACM International Conference on Image and Video Retrieval
Design and implementation of geo-tagged video search framework

Journal of Visual Communication and Image Representation
Beyond GPS: determining the camera viewing direction of a geotagged image

Proceedings of the international conference on Multimedia
Rich location-driven tag cloud suggestions based on public, community, and personal sources

Proceedings of the 1st ACM international workshop on Connected multimedia
Collection-based sparse label propagation and its application on social group suggestion from photos

ACM Transactions on Intelligent Systems and Technology (TIST)
Context modeling in computer vision: techniques, implications, and applications

Multimedia Tools and Applications
Geotagging in multimedia and computer vision--a survey

Multimedia Tools and Applications
Geotagged iage rcognition by cmbining tree dfferent knds of golocation fatures

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Personalizing automated image annotation using cross-entropy

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Inferring photographic location using geotagged web images

Multimedia Tools and Applications
Fusing concept detection and geo context for visual search

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Knowledge adaptation for ad hoc multimedia event detection with few exemplars

Proceedings of the 20th ACM international conference on Multimedia
Predicting participants in public events using stock photos

Proceedings of the 20th ACM international conference on Multimedia
E-LAMP: integration of innovative ideas for multimedia event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic event recognition based only on vision cues is a challenging problem. This problem is particularly acute when the application domain is unconstrained still images available on the Internet or in personal repositories. In recent years, it has been shown that metadata captured with pictures can provide valuable contextual cues complementary to the image content and can be used to improve classification performance. With the recent geotagging phenomenon, an important piece of metadata available with many geotagged pictures now on the World Wide Web is GPS information. In this study, we obtain satellite images corresponding to picture location data and investigate their novel use to recognize the picture-taking environment, as if through a third eye above the object. Additionally, we combine this inference with classical vision-based event detection methods and study the synergistic fusion of the two approaches. We employ both color- and structure-based visual vocabularies for characterizing ground and satellite images, respectively. Training of satellite image classifiers is done using a multiclass AdaBoost engine while the ground image classifiers are trained using SVMs. Modeling and prediction involve some of the most interesting semantic event-activity classes encountered in consumer pictures, including those that occur in residential areas, commercial areas, beaches, sports venues, and parks. The powerful fusion of the complementary views achieves significant performance improvement over the ground view baseline. With integrated GPS-capable cameras on the horizon, we believe that our line of research can revolutionize event recognition and media annotation in years to come.