An effective heuristic for the smallest grammar problem

Authors:
Florian Benz;Timo Kötzing
Affiliations:
Saarland University, Saarbrücken, Germany;Universität Jena, Jena, Germany
Venue:
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Year:
2013

Citing 13
Cited 0

A new challenge for compression algorithms: genetic sequences

Information Processing and Management: an International Journal - Special issue: data compression
MAX-MIN Ant system

Future Generation Computer Systems
An Introduction to Genetic Algorithms

An Introduction to Genetic Algorithms
A Corpus for the Evaluation of Lossless Compression Algorithms

DCC '97 Proceedings of the Conference on Data Compression
Data Compression Using Long Common Strings

DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Searching for smallest grammars on large sequences and application to DNA

Journal of Discrete Algorithms
Choosing word occurrences for the smallest grammar problem

LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Grammar-based codes: a new class of universal lossless source codes

IEEE Transactions on Information Theory
Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models

IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding

IEEE Transactions on Information Theory
The smallest grammar problem

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The smallest grammar problem is the problem of finding the smallest context-free grammar that generates exactly one given sequence. Approximating the problem with a ratio of less than 8569/8568 is known to be NP-hard. Most work on this problem has focused on finding decent solutions fast (mostly in linear time), rather than on good heuristic algorithms. Inspired by a new perspective on the problem presented by Carrascosa et al.\ (2010), we investigate the performance of different heuristics on the problem. The aim is to find a good solution on large instances by allowing more than linear time. We propose a hybrid of a max-min ant system and a genetic algorithm that in combination with a novel local search outperforms the state of the art on all files of the Canterbury corpus, a standard benchmark suite. Furthermore, this hybrid performs well on a standard DNA corpus.