An Efficient Boosting Algorithm for Combining Preferences
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning query-class dependent weights in automatic video retrieval
Proceedings of the 12th annual ACM international conference on Multimedia
Learning the semantics of multimedia queries and concepts from a small number of examples
Proceedings of the 13th annual ACM international conference on Multimedia
Automatic discovery of query-class-dependent models for multimodal search
Proceedings of the 13th annual ACM international conference on Multimedia
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Probabilistic latent query analysis for combining multiple retrieval sources
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank: from pairwise approach to listwise approach
Proceedings of the 24th international conference on Machine learning
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query dependent ranking using K-nearest neighbor
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
ContextSeer: context search and recommendation at query time for shared consumer photos
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Keyword-based concept search on consumer photos by web-based kernel function
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Hi-index | 0.00 |
Multimodal fusion had been shown prominent in video search for the sheer volume of video data. The state-of-the-art methods address the problem by query-dependent fusion, where modality weights vary across query classes (e.g., object, sports, scenes, people, etc.). However, provided the training queries, most of the prior methods rely on manually pre-defined query classes, ad-hoc query class classification, and heuristically determined fusion weights, which suffer from accuracy issues and are not scalable to large-scale data. Unlike prior methods, we propose an adaptive query learning framework for multimodal fusion. For each new query, we adopt ListNet to adaptively learn the fusion weights from its semantically-related training queries dynamically selected by K-nearest neighbor method. ListNet is efficient for optimizing the performance in search ranking rather than classification. In general, the proposed method has the following advantages: 1) No pre-defined query classes are needed. 2) The multimodal query weights are automatically and adaptively learned without ad-hoc hand-tuning. 3) The query training examples are selected according to the query semantics and require no noisy query classification. Experimenting in large-scale video benchmarks (i.e., TRECVID), we will show that the proposed method is scalable and competitive with prior query-dependent methods.