A new challenge for compression algorithms: genetic sequences
Information Processing and Management: an International Journal - Special issue: data compression
Future Generation Computer Systems
An Introduction to Genetic Algorithms
An Introduction to Genetic Algorithms
A Corpus for the Evaluation of Lossless Compression Algorithms
DCC '97 Proceedings of the Conference on Data Compression
Data Compression Using Long Common Strings
DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
Searching for smallest grammars on large sequences and application to DNA
Journal of Discrete Algorithms
Choosing word occurrences for the smallest grammar problem
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Grammar-based codes: a new class of universal lossless source codes
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.00 |
The smallest grammar problem is the problem of finding the smallest context-free grammar that generates exactly one given sequence. Approximating the problem with a ratio of less than 8569/8568 is known to be NP-hard. Most work on this problem has focused on finding decent solutions fast (mostly in linear time), rather than on good heuristic algorithms. Inspired by a new perspective on the problem presented by Carrascosa et al.\ (2010), we investigate the performance of different heuristics on the problem. The aim is to find a good solution on large instances by allowing more than linear time. We propose a hybrid of a max-min ant system and a genetic algorithm that in combination with a novel local search outperforms the state of the art on all files of the Canterbury corpus, a standard benchmark suite. Furthermore, this hybrid performs well on a standard DNA corpus.