The world's fastest Scrabble program
Communications of the ACM
Implementing dynamic minimal-prefix tries
Software—Practice & Experience
An efficient implementation of trie structures
Software—Practice & Experience
Bonsai: a compact representation of trees
Software—Practice & Experience
Applications of finite automata representing large vocabularies
Software—Practice & Experience
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Improved behaviour of tries by adaptive branching
Information Processing Letters
A method of compressing trie structures
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Programming pearls: a spelling checker
Communications of the ACM
Linear Algorithm for Data Compression via String Matching
Journal of the ACM (JACM)
Adaptive Algorithms for Cache-Efficient Trie Search
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
INTEX: a corpus processing system
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A Method for Compressing Lexicons
DCC '02 Proceedings of the Data Compression Conference
New methods for compression of MP double array by compact management of suffixes
Information Processing and Management: an International Journal
Smaller representation of finite state automata
CIAA'11 Proceedings of the 16th international conference on Implementation and application of automata
Smaller representation of finite state automata
Theoretical Computer Science
Hi-index | 0.00 |
We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical information associated with each word. The compressed structure can be produced in O(N) time using suffix trees for finding repetitions in trie, but for large data sets space requirements are more prohibitive than time so suffix arrays are used instead, with compression time complexity O(N log N) for all but for the largest data sets.