Text compression
Communications of the ACM
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Efficient handling of N-gram language models for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
PORTAGE: a phrase-based machine translation system
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
How many bits are needed to store probabilities for phrase-based translation?
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Storing the web in memory: space efficient language models with constant time retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Faster and smaller N-gram language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
KenLM: faster and smaller language model queries
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Prius: generic hybrid trace compression for wireless sensor networks
Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems
Hi-index | 0.00 |
We present Tightly Packed Tries (TPTs), a compact implementation of read-only, compressed trie structures with fast on-demand paging and short load times. We demonstrate the benefits of TPTs for storing n-gram back-off language models and phrase tables for statistical machine translation. Encoded as TPTs, these databases require less space than flat text file representations of the same data compressed with the gzip utility. At the same time, they can be mapped into memory quickly and be searched directly in time linear in the length of the key, without the need to decompress the entire file. The overhead for local decompression during search is marginal.