A variable-length category-based n-gram language model

Authors:
T. R. Niesler;P. C. Woodland
Affiliations:
Dept. of Eng., Cambridge Univ., UK;Dept. of Eng., Cambridge Univ., UK
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 14

Supertagging: an approach to almost parsing

Computational Linguistics
New models for improving supertag disambiguation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
The SuperARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Reduced n-gram models for English and Chinese corpora

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
VGRAM: improving performance of approximate queries on string collections using variable-length grams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Improving dialogue systems in a home automation environment

Proceedings of the 1st international conference on Ambient media and systems
Word Segments in Category-Based Language Models for Automatic Speech Recognition

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
A joint language model with fine-grain syntactic tags

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Segment-based classes for language modeling within the field of CSR

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Pure high-order word dependence mining via information geometry

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Statistical and linguistic clustering for language modeling in ASR

CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Hierarchical finite-state models for speech translation using categorization of phrases

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Mining pure high-order word associations via information geometry for information retrieval

ACM Transactions on Information Systems (TOIS)
Lessons from the journey: a query log analysis of within-session learning

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A language model based on word-category n-grams and ambiguous category membership with n increased selectively to trade compactness for performance is presented. The use of categories leads intrinsically to a compact model with the ability to generalise to unseen word sequences, and diminishes the sparseness of the training data, thereby making larger n feasible. The language model implicitly involves a statistical tagging operation, which may be used explicitly to assign category assignments to untagged text. Experiments on the LOB corpus show the optimal model-building strategy to yield improved results with respect to conventional n-gram methods, and when used as a tagger, the model is seen to perform well in relation to a standard benchmark.