Finite automata for compact representation of tuple dictionaries

Authors:
Jan Daciuk;Gertjan van Noord
Affiliations:
Alfa-Informatica, Rijksuniversiteit Groningen, Oude Kijk in 't Jatstraat 26, Postbus 716, 9700 AS Groningen, The Netherlands;Alfa-Informatica, Rijksuniversiteit Groningen, Oude Kijk in 't Jatstraat 26, Postbus 716, 9700 AS Groningen, The Netherlands
Venue:
Theoretical Computer Science - Implementation and application automata
Year:
2004

Citing 9
Cited 0

Data length independent real number representation based on double exponential cut

Journal of Information Processing
Applications of finite automata representing large vocabularies

Software—Practice & Experience
Statistical methods for speech recognition

Statistical methods for speech recognition
Storing a sparse table

Communications of the ACM
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
A maximum entropy/minimum divergence translation model

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The order of prenominal adjectives in natural language generation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An unsupervised approach to prepositional phrase attachment using contextually similar words

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Finite state tools for natural language processing

Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A generalization of the dictionary data structure is described, called tuple dictionary. A tuple dictionary represents the mapping of n-tuples of strings to some value. This data structure is motivated by practical applications in speech and language processing, in which very large instances of tuple dictionaries are used to represent language models. A technique for compact representation of tuple dictionaries is presented. The technique can be seen as an application and extension of perfect hashing by means of finite-state automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.