Clustering based large margin classification: a scalable approach using SOCP formulation

Authors:
J. Saketha Nath;C. Bhattacharyya;M. N. Murty
Affiliations:
Indian Institute of Science, Bangalore, Karnataka, INDIA;Indian Institute of Science, Bangalore, Karnataka, INDIA;Indian Institute of Science, Bangalore, Karnataka, INDIA
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 7
Cited 3

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Duality and Geometry in SVM Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A robust minimax approach to classification

The Journal of Machine Learning Research
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Second Order Cone Programming Formulations for Feature Selection

The Journal of Machine Learning Research

Focused crawling with scalable ordinal regression solvers

Proceedings of the 24th international conference on Machine learning
Learning algorithms for link prediction based on chance constraints

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Fast classification for large data sets via random selection clustering and Support Vector Machines

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.