The em algorithm for kernel matrix completion with auxiliary data

  • Authors:
  • Koji Tsuda;Shotaro Akaho;Kiyoshi Asai

  • Affiliations:
  • Max Planck Institute for Biological Cybernetics, 72076 Tü/bingen, Germany/ and AIST Computational Biology Research Center, Tokyo, 135-0064, Japan;AIST Neuroscience Research Institute, Tsukuba, 305-8568, Japan;Department of Computational Biology, Graduate School of Frontier Science, University of Tokyo, Kashiwa, 277-8562, Japan/ and AIST Computational Biology Research Center Tokyo, 135-0064, Japan

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from another information source. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm (distinguished from the EM algorithm of Dempster et al., 1977) based on the information geometry of positive definite matrices. We will report promising results on bacteria clustering experiments using two marker sequences: 16S and gyrB.