Nonlinear clustering-based support vector machine for large data sets
Optimization Methods & Software - Mathematical programming in data mining and machine learning
Perspectives on social tagging
Journal of the American Society for Information Science and Technology
Hierarchical clustering support vector machines for classifying type-2 diabetes patients
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Expert Systems with Applications: An International Journal
Support vector machine classification based on fuzzy clustering for large data sets
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Detecting RNA sequences using two-stage SVM classifier
LSMS'07 Proceedings of the 2007 international conference on Life System Modeling and Simulation
Fuzzy clustering for semi-supervised learning --- case study: construction of an emotion lexicon
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
The impact of semi-supervised clustering on text classification
Proceedings of the 17th Panhellenic Conference on Informatics
Fast classification for large data sets via random selection clustering and Support Vector Machines
Intelligent Data Analysis
Hi-index | 0.00 |
The training time for SVMs to compute the maximal marginal hyper-plane is at least O(N虏) with the data set size N, which makes it non-favorable for large data sets. This paper presents a study for enhancing the training time of SVMs, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) Algorithm for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms. Clustering analysis helps find the boundary points, which are the most qualified data points to train SVMs, between two classes. We present a new approach of combination of SVMs and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with the Rocchio Bundling technique in terms of accuracy loss and training time gain using two benchmark real data sets.