Euclidean Embedding of Co-occurrence Data

Authors:
Amir Globerson;Gal Chechik;Fernando Pereira;Naftali Tishby
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 0
Cited 9

Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution

ICML '06 Proceedings of the 23rd international conference on Machine learning
Community evolution in dynamic multi-mode networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Uncertainty sampling and transductive experimental design for active dual supervision

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Survey of Statistical Network Models

Foundations and Trends® in Machine Learning
Modeling the evolution of associated data

Data & Knowledge Engineering
ILP, the blind, and the elephant: Euclidean embedding of co-proven queries

ILP'09 Proceedings of the 19th international conference on Inductive logic programming
Distributional lexical semantics: Toward uniform representation paradigms for advanced acquisition and processing tasks

Natural Language Engineering
Learning syntactic categories using paradigmatic representations of word context

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Multi-space probabilistic sequence modeling

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space, based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of the embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text data sets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling, IsoMap and correspondence analysis.