BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical
Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Proximal support vector machine classifiers
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Shrinkage estimator generalizations of Proximal Support Vector Machines
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Large Datasets in Arbitrary Metric Spaces
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
SVMTorch: support vector machines for large-scale regression problems
The Journal of Machine Learning Research
Finding the most interesting patterns in a database quickly by using sequential sampling
The Journal of Machine Learning Research
Training ν-Support Vector Classifiers: Theory and Algorithms
Neural Computation
Learning concepts from large scale imbalanced data sets using support cluster machines
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Nonlinear clustering-based support vector machine for large data sets
Optimization Methods & Software - Mathematical programming in data mining and machine learning
Fast Local Support Vector Machines for Large Datasets
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Block-quantized support vector ordinal regression
IEEE Transactions on Neural Networks
Tree Decomposition for Large-Scale SVM Problems
The Journal of Machine Learning Research
A fast data preprocessing procedure for support vector regression
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Research of granular support vector machine
Artificial Intelligence Review
Granular support vector machine based on mixed measure
Neurocomputing
Hi-index | 0.01 |
Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However, despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world data mining applications often involve huge numbers of data records. Thus it is too expensive to perform multiple scans on the entire data set, and it is also infeasible to put the data set in memory. This paper presents a method, Clustering-Based SVM (CB-SVM), that maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples. These samples carry statistical summaries of the data and maximize the benefit of learning. Our analyses show that the training complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much less than that of the entire data set. Our experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large data sets and very accurate in terms of classification.