Improving Markov chain classification using string transformations and evolutionary search

  • Authors:
  • Timothy Meekhof;Terence Soule;Robert B. Heckendorn

  • Affiliations:
  • University of Idaho, Moscow, ID, USA;University of Idaho, Moscow, ID, USA;University of Idaho, Moscow, ID, USA

  • Venue:
  • Proceedings of the 11th Annual conference on Genetic and evolutionary computation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Markov chain classification or n-gram modeling, as it is sometimes called, is a very common and powerful tool for many problems that involve sequences of finite tokens. It has been used in a wide range of tasks, including natural language modeling, author identification, protein similarity searches, and even bird-song recognition. Clearly, an improvement in the Markov chain classification will have broad implications in many fields. Our new system, called SCS, improves upon Markov chain classification by introducing a preprocessing step in which an arbitrary set of transformation functions are performed on the input sequences. Since the space of possible transformations is unbounded, a genetic algorithm search is used to search for functions that improve classification. We show that GA is able to consistently find preprocessing functions that substantially improve the performance of the Markov chain model.