Using evolutionary optimization to improve markov-based classification with limited training data

Authors:
Timothy Meekhof;Robert B. Heckendorn
Affiliations:
University of Idaho, Moscow, ID;University of Idaho, Moscow, ID
Venue:
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Year:
2005

Citing 2
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Introduction to Evolutionary Computing

Introduction to Evolutionary Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bayesian classification using Markov model analysis of token strings is used in many areas such as computational linguistics, speech recognition, and bioinformatics. Unfortunately, for many problems, the available data sets are too small to accurately estimate the large number of parameters in a Markov model. In our work, we explore the possibility of using string space transformations to reduce the perplexity of the modeling problem and thereby improve model performance. The set of all possible string-to-string transformation functions is very large. By using a genetic algorithm to search for transformation functions that improve the performance of a Markov-based classifier, we are able to construct a classifier system that performs better than the Markov classifier alone. We go on to demonstrate the improved performance on the problem of classifying English and Spanish character strings, where training set size is arbitrarily limited.