On the convergence of the coordinate descent method for convex differentiable minimization
Journal of Optimization Theory and Applications
A parallel mixture of SVMs for very large scale problems
Neural Computation
Neural Computation
Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Sparseness of support vector machines
The Journal of Machine Learning Research
Trading convexity for scalability
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
A coordinate gradient descent method for nonsmooth separable minimization
Mathematical Programming: Series A and B
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Fast Online Training of Ramp Loss Support Vector Machines
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Large linear classification when data cannot fit in memory
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Selective block minimization for faster convergence of limited memory large-scale linear models
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum inner-product search using cone trees
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Linear Classification has achieved complexity linear to the data size. However, in many applications, data contain large amount of samples that does not help improve the quality of model, but still cost much I/O and memory to process. In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. In particular, we employ truncated loss function to induce a series of convex programs with superior dual sparsity, and solve each dual using Indexed Block Coordinate Descent, which makes use of Approximate Nearest Neighbor (ANN) search to select active dual variables without I/O cost on irrelevant samples. We prove that, despite the bias and weak guarantee from ANN query, the proposed algorithm has global convergence to the solution defined on entire dataset, with sublinear complexity each iteration. Experiments in both sufficient and limited memory conditions show that the proposed approach learns many times faster than other state-of-the-art solvers without sacrificing accuracy.