Clustering support vector machines and its application to local protein tertiary structure prediction

Authors:
Jieyue He;Wei Zhong;Robert Harrison;Phang C. Tai;Yi Pan
Affiliations:
Department of Computer Science, Southeast University, Nanjing, China;Department of Computer Science;Department of Computer Science;Department of Biology, Georgia State University, Atlanta, GA;Department of Computer Science
Venue:
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Year:
2006

Citing 11
Cited 0

Nonlinear optimization: complexity issues

Nonlinear optimization: complexity issues
Advances in kernel methods: support vector learning

Advances in kernel methods: support vector learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Provably Fast Training Algorithms for Support Vector Machines

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
K-means Clustering Algorithm for Categorical Attributes

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Shrinkage estimator generalizations of Proximal Support Vector Machines

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An Effective Support Vector Machines (SVMs) Performance Using Hierarchical Clustering

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Mining Protein Sequence Motifs Representing Common 3D Structures

CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
Training ν-Support Vector Classifiers: Theory and Algorithms

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support Vector Machines (SVMs) are new generation of machine learning techniques and have shown strong generalization capability for many data mining tasks. SVMs can handle nonlinear classification by implicitly mapping input samples from the input feature space into another high dimensional feature space with a nonlinear kernel function. However, SVMs are not favorable for huge datasets with over millions of samples. Granular computing decomposes information in the form of some aggregates and solves the targeted problems in each granule. Therefore, we propose a novel computational model called Clustering Support Vector Machines (CSVMs) to deal with the complex classification problems for huge datasets. Taking advantage of both theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. This feature makes learning tasks for each CSVMs more specific and simpler. Moreover, CSVMs built particularly for each granule can be easily parallelized so that CSVMs can be used to handle huge datasets efficiently. The CSVMs model is used for predicting local protein tertiary structure. Compared with the conventional clustering method, the prediction accuracy for local protein tertiary structure has been improved noticeably when the new CSVM model is used. The encouraging experimental results indicate that our new computational model opens a new way to solve the complex classification for huge datasets.