A hybrid classification scheme for mining multisource geospatial data

Authors:
Ranga Raju Vatsavai;Budhendra Bhaduri
Affiliations:
Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, USA 37831;Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, USA 37831
Venue:
Geoinformatica
Year:
2011

Citing 9
Cited 0

Effects of Sample Size in Classifier Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introductory Digital Image Processing: A Remote Sensing Perspective

Introductory Digital Image Processing: A Remote Sensing Perspective
Remote Sensing Digital Image Analysis: An Introduction

Remote Sensing Digital Image Analysis: An Introduction
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Computer Processing of Remotely-Sensed Images: An Introduction

Computer Processing of Remotely-Sensed Images: An Introduction
Stabilizing Classifiers for Very Small Sample Sizes

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of the large number of accurate training samples (10 to 30 脳 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of the statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately, there is no convenient multivariate statistical model that can be employed for multisource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on Landsat satellite image datasets, and our new hybrid approach shows over 24% to 36% improvement in overall classification accuracy over conventional classification schemes.