Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays

Authors:
Strahil Ristov;Eric Laporte
Affiliations:
-;-
Venue:
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Year:
1999

Citing 13
Cited 4

The world's fastest Scrabble program

Communications of the ACM
Implementing dynamic minimal-prefix tries

Software—Practice & Experience
An efficient implementation of trie structures

Software—Practice & Experience
Bonsai: a compact representation of trees

Software—Practice & Experience
Applications of finite automata representing large vocabularies

Software—Practice & Experience
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Improved behaviour of tries by adaptive branching

Information Processing Letters
A method of compressing trie structures

Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Programming pearls: a spelling checker

Communications of the ACM
Linear Algorithm for Data Compression via String Matching

Journal of the ACM (JACM)
Adaptive Algorithms for Cache-Efficient Trie Search

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
INTEX: a corpus processing system

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

A Method for Compressing Lexicons

DCC '02 Proceedings of the Data Compression Conference
New methods for compression of MP double array by compact management of suffixes

Information Processing and Management: an International Journal
Smaller representation of finite state automata

CIAA'11 Proceedings of the 16th international conference on Implementation and application of automata
Smaller representation of finite state automata

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical information associated with each word. The compressed structure can be produced in O(N) time using suffix trees for finding repetitions in trie, but for large data sets space requirements are more prohibitive than time so suffix arrays are used instead, with compression time complexity O(N log N) for all but for the largest data sets.