Generalized algorithms for constructing statistical language models

Authors:
Cyril Allauzen;Mehryar Mohri;Brian Roark
Affiliations:
AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Year:
2003

Citing 8
Cited 33

Semirings, automata, languages

Semirings, automata, languages
Rational series and their languages

Rational series and their languages
Class-based n-gram models of natural language

Computational Linguistics
Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Automata: Theoretic Aspects of Formal Power Series

Automata: Theoretic Aspects of Formal Power Series
Semiring frameworks and algorithms for shortest-distance problems

Journal of Automata, Languages and Combinatorics
Finite-state transducers in language and speech processing

Computational Linguistics
An efficient compiler for weighted rewrite rules

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

A weighted finite state transducer translation template model for statistical machine translation

Natural Language Engineering
Discriminative language modeling with conditional random fields and the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Robust named entity extraction from large spoken archives

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Discriminative n-gram language modeling

Computer Speech and Language
Experiments on the France telecom 3000 voice agency corpus: academic research on an industrial spoken dialog system

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Large-Scale Statistical Machine Translation with Weighted Finite State Transducers

Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
A Memory-efficient ε-Removal Algorithm for Weighted Acyclic Finite-State Automata

Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Robust understanding in multimodal interfaces

Computational Linguistics
Hierarchical phrase-based translation with weighted finite state transducers

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Arabic diacritization using weighted finite-state transducers

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
European language translation with weighted finite state transducers: the CUED MT system for the 2008 ACL workshop on SMT

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Statistical lattice-based spoken document retrieval

ACM Transactions on Information Systems (TOIS)
Evaluation for WFST-based dialog management

Proceedings of the 3rd International Universal Communication Symposium
Variational decoding for statistical machine translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Statistical language models within the algebra of weighted rational languages

Acta Cybernetica
Applying Weighted Finite State Machines to Protocol Performance Analysis

SEEFM '09 Proceedings of the 2009 Fourth South-East European Workshop on Formal Methods
Efficient path counting transducers for minimum bayes-risk decoding of statistical machine translation lattices

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
fsm2 - a scripting language interpreter for manipulating weighted finite-state automata

FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing
Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars

Computational Linguistics
Finite-state models for speech-based search on mobile devices

Natural Language Engineering
Lexicographic semirings for exact automata encoding of sequence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Description of the JHU system combination scheme for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
A general weighted grammar library

CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Open source WFST tools for LVCSR cascade development

FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Measuring the confusability of pronunciations in speech recognition

FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
A monotonic statistical machine translation approach to speaking style transformation

Computer Speech and Language
Thinking outside the box for natural language processing

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Unsupervised learning on an approximate corpus

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Implicitly intersecting weighted automata using dual decomposition

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Deciphering foreign language by combining language models and context vectors

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representations of n-gram language models by weighted automata whose size is practical for offline use even for a vocabulary size of about 500,000 words and an n-gram order n = 6; and present a simple and more general technique for constructing class-based language models that allows each class to represent an arbitrary weighted automaton. An efficient implementation of our algorithms and techniques has been incorporated in a general software library for language modeling, the GRM Library, that includes many other text and grammar processing functionalities.