Training products of experts by minimizing contrastive divergence
Neural Computation
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Information-theoretic metric learning
Proceedings of the 24th international conference on Machine learning
Video suggestion and discovery for youtube: taking random walks through the view graph
Proceedings of the 17th international conference on World Wide Web
NUS-WIDE: a real-world web image database from National University of Singapore
Proceedings of the ACM International Conference on Image and Video Retrieval
Exploiting multi-modal interactions: a unified framework
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
TRECVID: benchmarking the effectiveness of information retrieval tasks on digital video
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Learning Multi-modal Similarity
The Journal of Machine Learning Research
Composite hashing with multiple information sources
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On clustering heterogeneous social media objects with outlier links
Proceedings of the fifth ACM international conference on Web search and data mining
Build your own music recommender by modeling internet radio streams
Proceedings of the 21st international conference on World Wide Web
A probabilistic model for multimodal hash function learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Multi-modal data is dramatically increasing with the fast growth of social media. Learning a good distance measure for data with multiple modalities is of vital importance for many applications, including retrieval, clustering, classification and recommendation. In this paper, we propose an effective and scalable multi-modal distance metric learning framework. Based on the multi-wing harmonium model, our method provides a principled way to embed data of arbitrary modalities into a single latent space, of which an optimal distance metric can be learned under proper supervision, i.e., by minimizing the distance between similar pairs whereas maximizing the distance between dissimilar pairs. The parameters are learned by jointly optimizing the data likelihood under the latent space model and the loss induced by distance supervision, thereby our method seeks a balance between explaining the data and providing an effective distance metric, which naturally avoids overfitting. We apply our general framework to text/image data and present empirical results on retrieval and classification to demonstrate the effectiveness and scalability.