Application of distributed SVM architectures in classifying forest data cover types

  • Authors:
  • Mira Trebar;Nigel Steele

  • Affiliations:
  • University of Ljubljana, Faculty of Computer and Information Science, Trzaska 25, 1000 Ljubljana, Slovenia;Department of Mathematical Sciences, Coventry University, Priory Street, Coventry CV1 5FB, UK

  • Venue:
  • Computers and Electronics in Agriculture
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many 'real-world' applications, a classification of large data sets, which are often also imbalanced, is difficult due to the small, but usually more interesting classes. In this study, a large data set, forest cover type classes, which is actually multi-class classification defined with seven imbalanced classes and used as a resource inventory information was analyzed and evaluated. The data set was transformed into seven new data sets and a support vector machine (SVM) was employed to solve a binary classification problem of balanced and imbalanced data sets with various sizes. In the two approaches considered, the use of distributed SVM architectures, which basically reduces the complexity of the quadratic optimization problem of very large data sets, and the use of two sampling approaches for classification of imbalanced data sets were combined and results presented. The experimental results of distributed SVM architectures show the improvement of the accuracy for larger data sets in comparison to a single SVM classifier and their ability to improve the correct classification of the minority class.