Space-efficient static trees and graphs
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Efficient handling of N-gram language models for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Engineering the LOUDS succinct tree representation
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Efficient implementation of rank and select functions for succinct representation
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Hi-index | 0.00 |
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N-gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N-gram language model structure. We also use 'variable length coding' and 'block-wise compression' to compress values associated with nodes. Experimental results for three large-scale N-gram compression tasks achieved a significant compression rate without any loss.