Analyses of multiple evidence combination
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Multimedia semantic indexing using model vectors
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Video search in concept subspace: a text-like paradigm
Proceedings of the 6th ACM international conference on Image and video retrieval
Semantic concept-based query expansion and re-ranking for multimedia retrieval
Proceedings of the 15th international conference on Multimedia
Foundations and Trends in Information Retrieval
Multiple Bernoulli relevance models for image and video annotation
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Video retrieval using high level features: exploiting query matching and confidence-based weighting
CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
ClassView: hierarchical video shot classification, indexing, and accessing
IEEE Transactions on Multimedia
Multimodal knowledge-based analysis in multimedia event detection
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
A Survey on Visual Content-Based Video Indexing and Retrieval
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Hi-index | 0.00 |
Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a text statement. For retrieval, we use text extracted from images in the videos, text recognized in the speech of its audio track, as well as automatically detected semantically meaningful visual video concepts identified with widely varying confidence in the videos. In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework. We use source expansion to build rich textual representations of semantic video concepts from large external sources such as the web. We find that concept-based retrieval significantly outperforms text based approaches in recall. Using an evaluation derived from the TRECVID MED'11 track, we present early results that an approach using multi-modal fusion can compensate for inadequacies in each modality, resulting in substantial effectiveness gains. With relevance feedback, our approach provides additional improvements of over 50%.