Combining kernels for classification

  • Authors:
  • Tony Jebara;Darrin P. Lewis

  • Affiliations:
  • Columbia University;Columbia University

  • Venue:
  • Combining kernels for classification
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Drawing inferences from large, heterogeneous data sets requires a theoretical framework that is capable of representing, for example, DNA and protein sequences, protein structures, microarray expression data, various types of interaction networks, etc. Recently, a class of algorithms known as kernel methods has emerged as a powerful framework for combining diverse types of data. The power and current popularity of kernel methods stem in part from their ability to handle diverse forms of structured inputs, including vectors, graphs, and strings. The support vector machine (SVM) algorithm is the most popular kernel method, due to its theoretical underpinnings and strong empirical performance on a wide variety of classification tasks. Recently, several methods have been proposed for combining kernels from heterogeneous data sources. Specifically, several recently described extensions allow the SVM to assign relative weights to various data sets, depending upon their utilities in performing a given classification task. However, all of these methods produce stationary combinations; i.e., the relative weights of the various kernels do not vary among input examples. In this work, we describe, implement and validate a method for combining multiple kernels in a nonstationary fashion, where the kernel function combination varies depending on the input. The approach uses a large-margin latent-variable generative probabilistic model within the maximum entropy discrimination (MED) framework. In this method, parameter estimation is rendered tractable by variational bounding and an iterative optimization procedure. Here, we propose an MED Hilbert space Gaussian mixture model, in which each component is implicitly mapped via a Mercer kernel function, and we show that the support vector machine is a special case of this model. The mixture model allows us to combine a given set of kernels in a nonlinear and nonstationary manner, while avoiding overfitting by regularization. We also derive an efficient sequential minimal optimization algorithm for discriminative parameter estimation. We empirically investigate the performance of the SVM and its variants on numerous multi-kernel learning tasks, ranging from illustrative synthetic data sets, to commonly used benchmark data sets, to the real-world computational biology problem of protein function annotation. In the majority of cases, and without any particular tuning of the algorithm, our new technique outperforms existing methods.