Multicategory composite least squares classifiers

  • Authors:
  • Seo Young Park;Yufeng Liu;Dacheng Liu;Paul Scholl

  • Affiliations:
  • Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA;Department of Statistics and Operations Research, Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA;Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgebury Road, PO Box 368 Ridgefield, CT 06877, USA;Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgebury Road, PO Box 368 Ridgefield, CT 06877, USA

  • Venue:
  • Statistical Analysis and Data Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory support vector machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high-dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 272-286, 2010