A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Neural networks and the bias/variance dilemma
Neural Computation
Overfitting and undercomputing in machine learning
ACM Computing Surveys (CSUR)
Machine Learning
Error reduction through learning multiple descriptions
Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Classification for Imprecise Environments
Machine Learning
The UCI KDD archive of large data sets for data mining research and experimentation
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
A framework for constructing features and models for intrusion detection systems
ACM Transactions on Information and System Security (TISSEC)
Machine Learning
Mastering Data Mining: The Art and Science of Customer Relationship Management
Mastering Data Mining: The Art and Science of Customer Relationship Management
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
IEEE Transactions on Pattern Analysis and Machine Intelligence
Using Rule Sets to Maximize ROC Performance
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Statistics and data mining: intersecting disciplines
ACM SIGKDD Explorations Newsletter
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
Lessons and Challenges from Mining Retail E-Commerce Data
Machine Learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data Mining and Knowledge Discovery
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
A decision rule-based method for feature selection in predictive data mining
Expert Systems with Applications: An International Journal
Applied Data Mining for Business and Industry
Applied Data Mining for Business and Industry
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Learning intrusion detection: supervised or unsupervised?
ICIAP'05 Proceedings of the 13th international conference on Image Analysis and Processing
Using attack-specific feature subsets for network intrusion detection
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
INFORMS Journal on Computing
Positive-versus-Negative Classification for Model Aggregation in Predictive Data Mining
INFORMS Journal on Computing
Hi-index | 12.05 |
One-Versus-All (OVA) classification is a classifier construction method where a k-class prediction task is decomposed into k 2-class sub-problems. One base model is constructed for each sub-problem and the base models are then combined into one model. Aggregate model implementation is the process of constructing several base models which are then combined into a single model for prediction. In essence, OVA classification is a method of aggregate modeling. This paper reports studies that were conducted to establish whether OVA classification can provide predictive performance gains when large volumes of data are available for modeling as is commonly the case in data mining. It is demonstrated in this paper that firstly, OVA modeling can be used to increase the amount of training data while at the same time using base model training sets whose size is much smaller than the total amount of available training data. Secondly, OVA models created from large datasets provide a higher level of predictive performance compared to single k-class models. Thirdly, the use of boosted OVA base models can provide higher predictive performance compared to un-boosted OVA base models. Fourthly, when the combination algorithm for base model predictions is able to resolve tied predictions, the resulting aggregate models provide a higher level of predictive performance.