Detecting and ordering salient regions

Authors:
Larry Shoemaker;Robert E. Banfield;Lawrence O. Hall;Kevin W. Bowyer;W. Philip Kegelmeyer
Affiliations:
Computer Science and Engineering, University of South Florida, Tampa, USA 33620-5399;Computer Science and Engineering, University of South Florida, Tampa, USA 33620-5399;Computer Science and Engineering, University of South Florida, Tampa, USA 33620-5399;Computer Science and Engineering, University of Notre Dame, South Bend, USA 46556;Sandia National Labs, Computer and Information Sciences, Livermore, USA 94551
Venue:
Data Mining and Knowledge Discovery
Year:
2011

Citing 29
Cited 0

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring lift quality in database marketing

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Modern Information Retrieval

Modern Information Retrieval
Random Forests

Machine Learning
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Distributed learning with bagging-like performance

Pattern Recognition Letters
A Fully Distributed Framework for Cost-Sensitive Data Mining

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Incremental learning with partial instance memory

Artificial Intelligence
Learning Ensembles from Bites: A Scalable and Accurate Approach

The Journal of Machine Learning Research
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Active learning of label ranking functions

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
Algorithms for discovering bucket orders from data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Predict Salient Regions from Disjoint and Skewed Training Sets

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
A Comparison of Decision Tree Ensemble Creation Techniques

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recommendation on Item Graphs

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Using classifier ensembles to label spatially disjoint data

Information Fusion
Introduction to Information Retrieval

Introduction to Information Retrieval
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning to order things

Journal of Artificial Intelligence Research
Ensembles of classifiers from spatially disjoint data

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Ensemble of SVMs for incremental learning

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Learning label preferences: ranking error versus position error

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
FCLib: a library for building data analysis and data discovery tools

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an ensemble approach to learning salient regions from arbitrarily partitioned data. The partitioning comes from the distributed processing requirements of large-scale simulations. The volume of the data is such that classifiers can train only on data local to a given partition. Since the data partition reflects the needs of the simulation, the class statistics can vary from partition to partition. Some classes will likely be missing from some or even most partitions. We combine a fast ensemble learning algorithm with scaled probabilistic majority voting in order to learn an accurate classifier from such data. Since some simulations are difficult to model without a considerable number of false positive errors, and since we are essentially building a search engine for simulation data, we order predicted regions to increase the likelihood that most of the top-ranked predictions are correct (salient). Results from simulation runs of a canister being torn and from a casing being dropped show that regions of interest are successfully identified in spite of the class imbalance in the individual training sets. Lift curve analysis shows that the use of data driven ordering methods provides a statistically significant improvement over the use of the default, natural time step ordering. Significant time is saved for the end user by allowing an improved focus on areas of interest without the need to conventionally search all of the data.