Multimodal knowledge-based analysis in multimedia event detection

Authors:
Ehsan Younessian;Teruko Mitamura;Alexander Hauptmann
Affiliations:
Nanyang Technological Uni., Singapore;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 10
Cited 2

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Flickr distance

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Exploiting Visual Concepts to Improve Text-Based Image Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Document Clustering Using Semantic Kernels Based on Term-Term Correlations

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Hybrid model for semantic similarity measurement

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

IEEE Transactions on Multimedia

Recommendations for video event recognition using concept vocabularies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Zero-shot video retrieval using content and concepts

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. We focus on the multimodal knowledge-based analysis in MED where we utilize meaningful and semantic features such as Automatic Speech Recognition (ASR) transcripts, acoustic concept indexing (i.e. 42 acoustic concepts) and visual semantic indexing (i.e. 346 visual concepts) to characterize videos in archive. We study two scenarios where we either do or do not use the provided example videos. In the former, we propose a novel Adaptive Semantic Similarity (ASS) to measure textual similarity between ASR transcripts of videos. We also incorporate acoustic concept indexing and classification to retrieve test videos, specially with too few spoken words. In the latter 'ad-hoc' scenario where we do not have any example video, we use only the event kit description to retrieve test videos ASR transcripts and visual semantics. We also propose an event-specific fusion scheme to combine textual and visual retrieval outputs. Our results show the effectiveness of the proposed ASS and acoustic concept indexing methods and their complimentary role. We also conduct a set of experiments to assess the proposed framework for the 'ad-hoc' scenario.