Generalized Discriminant Analysis Using a Kernel Approach
Neural Computation
A multi-stage automatic arrhythmia recognition and classification system
Computers in Biology and Medicine
Bioinformatics
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Hi-index | 0.01 |
The Gaussian function or kernel (exp(-@?x"i-x"j@?^2/@b)) based algorithms are popularly applied in various computational biology researches. It is well known for its outstanding capability of measuring the remote similarity between any two samples in a mapped space. The Gaussian kernel can not only be used in unsupervised fields but also in supervised cases. Despite the success of the Gaussian kernel in bioinformatics applications, the scalar parameter @b is demonstrated to have significant influences on final results. There are no good methods to determine optimal values of @b until now since they vary in different applications, which are usually identified by trial and error tests achieved by a global grid search in a pre-defined potential rage. This global grid search approach is heavily limited by the difficulty for setting proper start and end edges of the range, grid scales, as well as the huge search computational complexity in both cases of large dataset size and complicated learning algorithms. To deal with these problems, we present a systematic protocol consisting of two data-driven approaches to derive optimal choices for the Gaussian kernel parameter in bioinformatics studies, one for unsupervised cases and the other for supervised applications. The advantage of the two methods is that they only depend on the original dataset. The corresponding experiments on 6 datasets demonstrate the robustness and efficacy of the proposed approaches. An online calculator is implemented at: http://www.csbio.sjtu.edu.cn/bioinf/GFO/ for free academic use.