Using classifier ensembles to label spatially disjoint data

Authors:
Larry Shoemaker;Robert E. Banfield;Lawrence O. Hall;Kevin W. Bowyer;W. Philip Kegelmeyer
Affiliations:
Department of Computer Science and Engineering, ENB118, University of South Florida, 4202 E. Fowler Avenue Tampa, FL 33620-9951, USA;Department of Computer Science and Engineering, ENB118, University of South Florida, 4202 E. Fowler Avenue Tampa, FL 33620-9951, USA;Department of Computer Science and Engineering, ENB118, University of South Florida, 4202 E. Fowler Avenue Tampa, FL 33620-9951, USA;Department of Computer Science and Engineering, University of Notre Dame, South Bend, IN 46556, USA;Sandia National Laboratories, Computational Science and Math Research Department, P.O. Box 969, MS 9159 Livermore, CA 94551-0969, USA
Venue:
Information Fusion
Year:
2008

Citing 20
Cited 1

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Distributed learning with bagging-like performance

Pattern Recognition Letters
Toward a Query Language on Simulation Mesh Data: An Object-oriented Approach

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
A Fully Distributed Framework for Cost-Sensitive Data Mining

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Incremental learning with partial instance memory

Artificial Intelligence
Learning Ensembles from Bites: A Scalable and Accurate Approach

The Journal of Machine Learning Research
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
Learning to Predict Salient Regions from Disjoint and Skewed Training Sets

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The CSU face identification evaluation system: its purpose, features, and structure

ICVS'03 Proceedings of the 3rd international conference on Computer vision systems
Ensembles of classifiers from spatially disjoint data

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Ensemble of SVMs for incremental learning

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
FCLib: a library for building data analysis and data discovery tools

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Detecting and ordering salient regions

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an ensemble approach to learning from arbitrarily partitioned data. The partitioning comes from the distributed processing requirements of a large scale simulation. The volume of the data is such that classifiers can train only on data local to a given partition. As a result of the partition reflecting the needs of the simulation, the class statistics can vary from partition to partition. Some classes will likely be missing from some partitions. We combine a fast ensemble learning algorithm with probabilistic majority voting in order to learn an accurate classifier from such data. Results from simulations of an impactor bar crushing a storage canister and from facial feature recognition show that regions of interest are successfully identified in spite of the class imbalance in the individual training sets.