Elements of information theory
Elements of information theory
Fundamentals of statistical signal processing: estimation theory
Fundamentals of statistical signal processing: estimation theory
Applied numerical linear algebra
Applied numerical linear algebra
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Clustering based on conditional distributions in an auxiliary space
Neural Computation
Multivariate Information Bottleneck
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Sufficient dimensionality reduction
The Journal of Machine Learning Research
A semi-continuous version of the Berger-Yeung problem
IEEE Transactions on Information Theory
To code, or not to code: lossy source-channel communication revisited
IEEE Transactions on Information Theory
Duality between source coding and channel coding and its extension to the side information case
IEEE Transactions on Information Theory
Learning and Generalization with the Information Bottleneck
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Learning and generalization with the information bottleneck
Theoretical Computer Science
Pattern Recognition Letters
Hi-index | 0.01 |
The problem of extracting the relevant aspects of data was previously addressed through the information bottleneck (IB) method, through (soft) clustering one variable while preserving information about another - relevance - variable. The current work extends these ideas to obtain continuous representations that preserve relevant information, rather than discrete clusters, for the special case of multivariate Gaussian variables. While the general continuous IB problem is difficult to solve, we provide an analytic solution for the optimal representation and tradeoff between compression and relevance for the this important case. The obtained optimal representation is a noisy linear projection to eigenvectors of the normalized regression matrix Σx|yΣx-1, which is also the basis obtained in canonical correlation analysis. However, in Gaussian IB, the compression tradeoff parameter uniquely determines the dimension, as well as the scale of each eigenvector, through a cascade of structural phase transitions. This introduces a novel interpretation where solutions of different ranks lie on a continuum parametrized by the compression level. Our analysis also provides a complete analytic expression of the preserved information as a function of the compression (the "information-curve"), in terms of the eigenvalue spectrum of the data. As in the discrete case, the information curve is concave and smooth, though it is made of different analytic segments for each optimal dimension. Finally, we show how the algorithmic theory developed in the IB framework provides an iterative algorithm for obtaining the optimal Gaussian projections.