Large scale data mining using genetics-based machine learning

Authors:
Jaume Bacardit;Xavier Llorà
Affiliations:
University of Nottingham, Nottingham, United Kingdom;Google Inc, Mountain View, CA, USA
Venue:
Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
Year:
2012

Citing 16
Cited 1

Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
E2K: evolution to knowledge

ACM SIGEVOlution
Toward routine billion-variable optimization using genetic algorithms: Short Communication

Complexity
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automated alphabet reduction method with evolutionary algorithms for protein structure prediction

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Do not match, inherit: fitness surrogates for genetics-based machine learning techniques

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Learning Classifier Systems in Data Mining

Learning Classifier Systems in Data Mining
An analysis of matching in learning classifier systems

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Empirical Evaluation of Ensemble Techniques for a Pittsburgh Learning Classifier System

Learning Classifier Systems
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Performance and efficiency of memetic pittsburgh learning classifier systems

Evolutionary Computation
Speeding up the evaluation of evolutionary learning systems using GPGPUs

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Modelling the initialisation stage of the ALKR representation for discrete domains and GABIL encoding

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Knowledge-based fast evaluation for evolutionary learning

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Domain of competence of XCS classifier system in complexity measurement space

IEEE Transactions on Evolutionary Computation
Training genetic programming on half a million patterns: an example from anomaly detection

IEEE Transactions on Evolutionary Computation

An analysis of a spatial EA parallel boosting algorithm

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are living in the peta-byte era. We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task. Recent advances in representations, learning paradigms, and theoretical modelling have showed the competitiveness of non EC techniques in herding large scale data analysis. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented computational resources on the edge of petascale computing. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them. This tutorial will try to shed light to the above mentioned questions, following a roadmap that starts exploring what large scale means, and why large is a challenge and opportunity for GBML methods. As we will show later, opportunity has multiple facets: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms, and alternative programming models, each of them helping to make GBML very attractive for large-scale data mining. Given these building blocks, we will continue to unfold how we can model the scalability of the components of GBML systems targeting a better engineering effort that will make embracing large datasets routine. Finally, we will illustrate how all these ideas fit by reviewing real applications of GBML systems and what further directions will require serious consideration.