Statistical language models within the algebra of weighted rational languages

Authors:
Thomas Hanneforth;Kay-Michael Würzner
Affiliations:
University of Potsdam;University of Potsdam
Venue:
Acta Cybernetica
Year:
2009

Citing 14
Cited 1

Semirings, automata, languages

Semirings, automata, languages
Minimisation of acyclic deterministic automata in linear time

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Statistical methods for speech recognition

Statistical methods for speech recognition
Minimization algorithms for sequential transducers

Theoretical Computer Science
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Semiring frameworks and algorithms for shortest-distance problems

Journal of Automata, Languages and Combinatorics
On transformations of formal power series

Information and Computation
Finite-state transducers in language and speech processing

Computational Linguistics
Simpler and more general minimization for weighted finite-state automata

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Generalized algorithms for constructing statistical language models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Microtext: the design of a microprogrammed finite state search machine for full-text retrieval

AFIPS '72 (Fall, part I) Proceedings of the December 5-7, 1972, fall joint computer conference, part I
A Memory-efficient ε-Removal Algorithm for Weighted Acyclic Finite-State Automata

Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
On-the-fly techniques for game-based software model checking

TACAS'08/ETAPS'08 Proceedings of the Theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems

fsm2 - a scripting language interpreter for manipulating weighted finite-state automata

FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a corpus. In this paper, we present a novel way of creating N-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter N. In addition, we discuss efficient implementations of these transductions in terms of virtual constructions.