Adaptive Learning for Multimodal Fusion in Video Search

Authors:
Wen-Yu Lee;Po-Tun Wu;Winston Hsu
Affiliations:
National Taiwan University, Taiwan;National Taiwan University, Taiwan;National Taiwan University, Taiwan
Venue:
PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Year:
2009

Citing 11
Cited 0

An Efficient Boosting Algorithm for Combining Preferences

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning query-class dependent weights in automatic video retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Learning the semantics of multimedia queries and concepts from a small number of examples

Proceedings of the 13th annual ACM international conference on Multimedia
Automatic discovery of query-class-dependent models for multimodal search

Proceedings of the 13th annual ACM international conference on Multimedia
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Probabilistic latent query analysis for combining multiple retrieval sources

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query dependent ranking using K-nearest neighbor

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
ContextSeer: context search and recommendation at query time for shared consumer photos

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Keyword-based concept search on consumer photos by web-based kernel function

MM '08 Proceedings of the 16th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal fusion had been shown prominent in video search for the sheer volume of video data. The state-of-the-art methods address the problem by query-dependent fusion, where modality weights vary across query classes (e.g., object, sports, scenes, people, etc.). However, provided the training queries, most of the prior methods rely on manually pre-defined query classes, ad-hoc query class classification, and heuristically determined fusion weights, which suffer from accuracy issues and are not scalable to large-scale data. Unlike prior methods, we propose an adaptive query learning framework for multimodal fusion. For each new query, we adopt ListNet to adaptively learn the fusion weights from its semantically-related training queries dynamically selected by K-nearest neighbor method. ListNet is efficient for optimizing the performance in search ranking rather than classification. In general, the proposed method has the following advantages: 1) No pre-defined query classes are needed. 2) The multimodal query weights are automatically and adaptively learned without ad-hoc hand-tuning. 3) The query training examples are selected according to the query semantics and require no noisy query classification. Experimenting in large-scale video benchmarks (i.e., TRECVID), we will show that the proposed method is scalable and competitive with prior query-dependent methods.