Evolutionary algorithms for grouping high dimensional Email data

  • Authors:
  • Steve Counsell;Xiaohui Liu;Janet McFall;Stephen Swift;Allan Tucker

  • Affiliations:
  • School of Computer Science and Information Systems, Birkbeck College, University of London, Malet Street, London, UK E-mail: steve@dcs.bbk.ac.uk;Department of Information Systems and Computing, Brunel University, Uxbridge, UK. Tel.: +44 1895 816240, +44 1895 816253, +44 1895 816253/ E-mail: {Xiaohui.Liu, Stephen.Swift, Allan.Tucker}@brunel ...;The Building Crafts College, Stratford, London, UK;(Correspd. Tel.: +44 1895 816253) Department of Information Systems and Computing, Brunel University, Uxbridge, UK. Tel.: +44 1895 816240, +44 1895 816253, +44 1895 816253/ E-mail: {Xiaohui.Liu, S ...;Department of Information Systems and Computing, Brunel University, Uxbridge, UK. Tel.: +44 1895 816240, +44 1895 816253, +44 1895 816253/ E-mail: {Xiaohui.Liu, Stephen.Swift, Allan.Tucker}@brunel ...

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grouping problems arise in many industrial and medical applications; examples include bin packing, workshop layout design, and graph colouring. This type of problem has been successfully handled using Grouping Genetic Algorithms. However in problems where there are perhaps thousands of objects to be grouped, we have found that Genetic Algorithm approaches can run into problems. This paper continues our research into a method we have developed for decomposing a large number of objects into mutually exclusive subsets where within-group dependencies are high and between-group dependencies are low. The method uses an Evolutionary Algorithm approach but where the whole population is a solution to the grouping problem rather than considering many candidate solutions. This reduces the resource overheads during computer implementation and the results are promising when compared with standard statistical methods and a Hill Climbing algorithm, all applied to email log file data.