Machine Learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Data preparation for data mining
Data preparation for data mining
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Bayes Optimal Approach for Partitioning the Values of Categorical Attributes
The Journal of Machine Learning Research
Training a Support Vector Machine in the Primal
Neural Computation
Compression-Based Averaging of Selective Naive Bayes Classifiers
The Journal of Machine Learning Research
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
A scalable approach to simultaneous evolutionary instance and feature selection
Information Sciences: an International Journal
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time.