Applications of finite automata representing large vocabularies
Software—Practice & Experience
Software—Practice & Experience
Experiments with Automata Compression
CIAA '00 Revised Papers from the 5th International Conference on Implementation and Application of Automata
Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Word hy-phen-a-tion by com-put-er (hyphenation, computer)
Word hy-phen-a-tion by com-put-er (hyphenation, computer)
A Compression Method for Natural Language Automata
Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Hi-index | 0.04 |
This paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory [4]. We investigate several techniques of reducing the memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a size gain of around 20-30% compared to the original representation given in [4]. This result is comparable to the state-of-the-art dictionary compression techniques like the LZ-trie [10] method, but remains memory and CPU efficient during construction.