An Effective Support Vector Machines (SVMs) Performance Using Hierarchical Clustering

Authors:
Mamoun Awad;Latifur Khan;Farokh Bastani;I-Ling Yen
Affiliations:
University of Texas at Dallas;University of Texas at Dallas;University of Texas at Dallas;University of Texas at Dallas
Venue:
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Year:
2004

Citing 0
Cited 11

Support vector machine classification for large data sets via minimum enclosing ball clustering

Neurocomputing
Nonlinear clustering-based support vector machine for large data sets

Optimization Methods & Software - Mathematical programming in data mining and machine learning
Perspectives on social tagging

Journal of the American Society for Information Science and Technology
Hierarchical clustering support vector machines for classifying type-2 diabetes patients

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine

Expert Systems with Applications: An International Journal
Support vector machine classification based on fuzzy clustering for large data sets

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Clustering support vector machines and its application to local protein tertiary structure prediction

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Detecting RNA sequences using two-stage SVM classifier

LSMS'07 Proceedings of the 2007 international conference on Life System Modeling and Simulation
Fuzzy clustering for semi-supervised learning --- case study: construction of an emotion lexicon

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
The impact of semi-supervised clustering on text classification

Proceedings of the 17th Panhellenic Conference on Informatics
Fast classification for large data sets via random selection clustering and Support Vector Machines

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The training time for SVMs to compute the maximal marginal hyper-plane is at least O(N虏) with the data set size N, which makes it non-favorable for large data sets. This paper presents a study for enhancing the training time of SVMs, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) Algorithm for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms. Clustering analysis helps find the boundary points, which are the most qualified data points to train SVMs, between two classes. We present a new approach of combination of SVMs and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with the Rocchio Bundling technique in terms of accuracy loss and training time gain using two benchmark real data sets.