Multi-modal distance metric learning

Authors:
Pengtao Xie;Eric P. Xing
Affiliations:
Department of Computer Science, Tsinghua University, Beijing, China;Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 13
Cited 0

Training products of experts by minimizing contrastive divergence

Neural Computation
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Information-theoretic metric learning

Proceedings of the 24th international conference on Machine learning
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
Exploiting multi-modal interactions: a unified framework

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
TRECVID: benchmarking the effectiveness of information retrieval tasks on digital video

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Learning Multi-modal Similarity

The Journal of Machine Learning Research
Composite hashing with multiple information sources

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On clustering heterogeneous social media objects with outlier links

Proceedings of the fifth ACM international conference on Web search and data mining
Build your own music recommender by modeling internet radio streams

Proceedings of the 21st international conference on World Wide Web
A probabilistic model for multimodal hash function learning

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving tweet stream classification by detecting changes in word probability

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-modal data is dramatically increasing with the fast growth of social media. Learning a good distance measure for data with multiple modalities is of vital importance for many applications, including retrieval, clustering, classification and recommendation. In this paper, we propose an effective and scalable multi-modal distance metric learning framework. Based on the multi-wing harmonium model, our method provides a principled way to embed data of arbitrary modalities into a single latent space, of which an optimal distance metric can be learned under proper supervision, i.e., by minimizing the distance between similar pairs whereas maximizing the distance between dissimilar pairs. The parameters are learned by jointly optimizing the data likelihood under the latent space model and the loss induced by distance supervision, thereby our method seeks a balance between explaining the data and providing an effective distance metric, which naturally avoids overfitting. We apply our general framework to text/image data and present empirical results on retrieval and classification to demonstrate the effectiveness and scalability.