Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation

Authors:
John A. Lee;Emilie Renard;Guillaume Bernard;Pierre Dupont;Michel Verleysen
Affiliations:
Université catholique de Louvain, Molecular Imaging, Radiotherapy, and Oncology-IREC, Avenue Hippocrate 55, B-1200 Bruxelles, Belgium;Université catholique de Louvain, Machine Learning Group-ICTEAM, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium;Université catholique de Louvain, Molecular Imaging, Radiotherapy, and Oncology-IREC, Avenue Hippocrate 55, B-1200 Bruxelles, Belgium;Université catholique de Louvain, Machine Learning Group-ICTEAM, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium;Université catholique de Louvain, Machine Learning Group-ICTEAM, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium
Venue:
Neurocomputing
Year:
2013

Citing 15
Cited 0

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Mapping a manifold of perceptual observations

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
A kernel view of the dimensionality reduction of manifolds

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Local multidimensional scaling

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
Quality assessment of dimensionality reduction: Rank-based criteria

Neurocomputing
Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

The Journal of Machine Learning Research
Scale-independent quality criteria for dimensionality reduction

Pattern Recognition Letters
High-throughput multi-dimensional scaling (HiT-MDS) for cDNA-array expression data

ICANN'05 Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I
Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences

Neurocomputing
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory
On the convexity of some divergence measures based on entropy functions

IEEE Transactions on Information Theory
Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Stochastic neighbor embedding (SNE) and its variants are methods of dimensionality reduction (DR) that involve normalized softmax similarities derived from pairwise distances. These methods try to reproduce in the low-dimensional embedding space the similarities observed in the high-dimensional data space. Their outstanding experimental results, compared to previous state-of-the-art methods, originate from their capability to foil the curse of dimensionality. Previous work has shown that this immunity stems partly from a property of shift invariance that allows appropriately normalized softmax similarities to mitigate the phenomenon of norm concentration. This paper investigates a complementary aspect, namely, the cost function that quantifies the mismatch between similarities computed in the high- and low-dimensional spaces. Stochastic neighbor embedding and its variant t-SNE rely on a single Kullback-Leibler divergence, whereas a weighted mixture of two dual KL divergences is used in neighborhood retrieval and visualization (NeRV). We propose in this paper a different mixture of KL divergences, which is a scaled version of the generalized Jensen-Shannon divergence. We show experimentally that this divergence produces embeddings that better preserve small K-ary neighborhoods, as compared to both the single KL divergence used in SNE and t-SNE and the mixture used in NeRV. These results allow us to conclude that future improvements in similarity-based DR will likely emerge from better definitions of the cost function.