Smaller representation of finite state automata

Authors:
Jan Daciuk;Dawid Weiss
Affiliations:
Knowledge Engineering Department, Gdansk University of Technology, Poland;Institute of Computing Science, Poznan University of Technology, Poland
Venue:
CIAA'11 Proceedings of the 16th international conference on Implementation and application of automata
Year:
2011

Citing 6
Cited 0

Applications of finite automata representing large vocabularies

Software—Practice & Experience
How to squeeze a lexicon

Software—Practice & Experience
Experiments with Automata Compression

CIAA '00 Revised Papers from the 5th International Conference on Implementation and Application of Automata
Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Word hy-phen-a-tion by com-put-er (hyphenation, computer)

Word hy-phen-a-tion by com-put-er (hyphenation, computer)
A Compression Method for Natural Language Automata

Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008

Quantified Score

Hi-index	0.04

Visualization

Abstract

This paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory [4]. We investigate several techniques of reducing the memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a size gain of around 20-30% compared to the original representation given in [4]. This result is comparable to the state-of-the-art dictionary compression techniques like the LZ-trie [10] method, but remains memory and CPU efficient during construction.