Smaller representation of finite state automata

Authors:
Jan Daciuk;Dawid Weiss
Affiliations:
Department of Intelligent Interactive Systems, Gdask University of Technology, Poland;Institute of Computing Science, Poznan University of Technology, Poland
Venue:
Theoretical Computer Science
Year:
2012

Citing 11
Cited 0

Optimization of parser tables for portable compilers

ACM Transactions on Programming Languages and Systems (TOPLAS) - Lecture notes in computer science Vol. 174
Storing a Sparse Table with 0(1) Worst Case Access Time

Journal of the ACM (JACM)
Applications of finite automata representing large vocabularies

Software—Practice & Experience
Storing a sparse table

Communications of the ACM
How to squeeze a lexicon

Software—Practice & Experience
Practical Optimizations for Automata

WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
Experiments with Automata Compression

CIAA '00 Revised Papers from the 5th International Conference on Implementation and Application of Automata
Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Word hy-phen-a-tion by com-put-er (hyphenation, computer)

Word hy-phen-a-tion by com-put-er (hyphenation, computer)
A Compression Method for Natural Language Automata

Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Incremental and semi-incremental construction of pseudo-minimal automata

CIAA'05 Proceedings of the 10th international conference on Implementation and Application of Automata

Quantified Score

Hi-index	5.23

Visualization

Abstract

This paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory (Daciuk, 2000) [4]. We investigate several techniques for reducing the memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a size gain of around 20%-30% compared to the original representation given in [4]. This result is comparable to the state-of-the-art dictionary compression techniques like the LZ-trie (Ristov and Laporte, 1999) [15] method, but remains memory and CPU efficient during construction.