Tightly packed tries: how to fit large models into memory, and make them load fast, too

  • Authors:
  • Ulrich Germann;Eric Joanis;Samuel Larkin

  • Affiliations:
  • University of Toronto and National Research Council Canada;National Research Council Canada;National Research Council Canada

  • Venue:
  • SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present Tightly Packed Tries (TPTs), a compact implementation of read-only, compressed trie structures with fast on-demand paging and short load times. We demonstrate the benefits of TPTs for storing n-gram back-off language models and phrase tables for statistical machine translation. Encoded as TPTs, these databases require less space than flat text file representations of the same data compressed with the gzip utility. At the same time, they can be mapped into memory quickly and be searched directly in time linear in the length of the key, without the need to decompress the entire file. The overhead for local decompression during search is marginal.