Online multimodal deep similarity learning with application to image retrieval

Authors:
Pengcheng Wu;Steven C.H. Hoi;Hao Xia;Peilin Zhao;Dayong Wang;Chunyan Miao
Affiliations:
Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 31
Cited 0

Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A Unified Log-Based Relevance Feedback Scheme for Image Retrieval

IEEE Transactions on Knowledge and Data Engineering
Prediction, Learning, and Games

Prediction, Learning, and Games
Learning Distance Metrics with Contextual Constraints for Image Retrieval

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A fast learning algorithm for deep belief nets

Neural Computation
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning
Information Fusion in Multimedia Information Retrieval

Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
VisualRank: Applying PageRank to Large-Scale Image Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Localized Content-Based Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantic hashing

International Journal of Approximate Reasoning
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Large Scale Online Learning of Image Similarity Through Ranking

The Journal of Machine Learning Research
Semantics-preserving bag-of-words models and applications

IEEE Transactions on Image Processing
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

The Journal of Machine Learning Research
Learning Multi-modal Similarity

The Journal of Machine Learning Research
Double Updating Online Learning

The Journal of Machine Learning Research
Bilinear deep learning for image classification

MM '11 Proceedings of the 19th ACM international conference on Multimedia
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Multiview Metric Learning with Global Consistency and Local Smoothness

ACM Transactions on Intelligent Systems and Technology (TIST)
Random maximum margin hashing

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Aggregating Local Image Descriptors into Compact Codes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Deep Learning to Hash with Multiple Representations

ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in real-world applications; (ii) they are often designed for learning distance metrics on uni-modal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified two-stage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.