The Strength of Weak Learnability
Machine Learning
Spatial tessellations: concepts and applications of Voronoi diagrams
Spatial tessellations: concepts and applications of Voronoi diagrams
Boosting a weak learning algorithm by majority
Information and Computation
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning from Data: Concepts, Theory, and Methods
Learning from Data: Concepts, Theory, and Methods
Learning When Negative Examples Abound
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Improving Minority Class Prediction Using Case-Specific Feature Weights
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Regressors using Boosting Techniques
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Dynamic Training Subset Selection for Supervised Learning in Genetic Programming
PPSN III Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature
Adaptive Genetic Programming Applied to New and Existing Simple Regression Problems
EuroGP '01 Proceedings of the 4th European Conference on Genetic Programming
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Applying Boosting Techniques to Genetic Programming
Selected Papers from the 5th European Conference on Artificial Evolution
Genetic Programming and Evolvable Machines
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
From outliers to prototypes: Ordering data
Neurocomputing
Probabilistic incremental program evolution
Evolutionary Computation
IEEE Transactions on Evolutionary Computation
Improving symbolic regression with interval arithmetic and linear scaling
EuroGP'03 Proceedings of the 6th European conference on Genetic programming
Crossover bias in genetic programming
EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Coevolution of Fitness Predictors
IEEE Transactions on Evolutionary Computation
Parse-matrix evolution for symbolic regression
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Symbolic regression of input-output data conventionally treats data records equally. We suggest a framework for automatic assignment of weights to data samples, which takes into account the sample's relative importance. In this paper, we study the possibilities of improving symbolic regression on real-life data by incorporating weights into the fitness function. We introduce four weighting schemes defining the importance of a point relative to proximity, surrounding, remoteness, and nonlinear deviation from k nearest-in-the-input-space neighbors. For enhanced analysis and modeling of large imbalanced data sets we introduce a simple multidimensional iterative technique for subsampling. This technique allows a sensible partitioning (and compression) of data to nested subsets of an arbitrary size in such a way that the subsets are balanced with respect to either of the presented weighting schemes. For cases where a given input-output data set contains some redundancy, we suggest an approach to considerably improve the effectiveness of regression by applying more modeling effort to a smaller subset of the data set that has a similar information content. Such improvement is achieved due to better exploration of the search space of potential solutions at the same number of function evaluations. We compare different approaches to regression on five benchmark problems with a fixed budget allocation. We demonstrate that the significant improvement in the quality of the regression models can be obtained either with the weighted regression, exploratory regression using a compressed subset with a similar information content, or exploratory weighted regression on the compressed subset, which is weighted with one of the proposed weighting schemes.