The em algorithm for kernel matrix completion with auxiliary data

Authors:
Koji Tsuda;Shotaro Akaho;Kiyoshi Asai
Affiliations:
Max Planck Institute for Biological Cybernetics, 72076 Tü/bingen, Germany/ and AIST Computational Biology Research Center, Tokyo, 135-0064, Japan;AIST Neuroscience Research Institute, Tsukuba, 305-8568, Japan;Department of Computational Biology, Graduate School of Frontier Science, University of Tokyo, Kashiwa, 277-8562, Japan/ and AIST Computational Biology Research Center Tokyo, 135-0064, Japan
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 11
Cited 12

The nature of statistical learning theory

The nature of statistical learning theory
Information geometry of the EM and em algorithms for neural networks

Neural Networks
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Prediction with Gaussian processes: from linear regression to linear prediction and beyond

Learning in graphical models
Mixtures of probabilistic principal component analyzers

Neural Computation
Convergence of the wake-sleep algorithm

Proceedings of the 1998 conference on Advances in neural information processing systems II
Sparse on-line Gaussian processes

Neural Computation
Model Selection in Unsupervised Learning with Applications To Document Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Kernel Matrix Completion by Semidefinite Programming

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Learning the Kernel Matrix with Semi-Definite Programming

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Inferring parameters and structure of latent variable models by variational bayes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Incomplete-data classification using logistic regression

ICML '05 Proceedings of the 22nd international conference on Machine learning
Model-based transductive learning of the kernel matrix

Machine Learning
Kernelizing the output of tree-based methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning the kernel matrix by maximizing a KFD-based class separability criterion

Pattern Recognition
On Classification with Incomplete Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Testing the significance of the RV coefficient

Computational Statistics & Data Analysis
A scalable kernel-based semisupervised metric learning algorithm with out-of-sample generalization ability

Neural Computation
Protein functional class prediction with a combined graph

Expert Systems with Applications: An International Journal
Patient-centered yes/no prognosis using learning machines

International Journal of Data Mining and Bioinformatics
Modeling adaptive kernels from probabilistic phylogenetic trees

Artificial Intelligence in Medicine
Large gap imputation in remote sensed imagery of the environment

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from another information source. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm (distinguished from the EM algorithm of Dempster et al., 1977) based on the information geometry of positive definite matrices. We will report promising results on bacteria clustering experiments using two marker sequences: 16S and gyrB.