A tree-based statistical language model for natural language speech recognition
Readings in speech recognition
Class-based n-gram models of natural language
Computational Linguistics
Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
Algorithms for bigram and trigram word clustering
Speech Communication
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A second-order Hidden Markov Model for part-of-speech tagging
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A variable-length category-based n-gram language model
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Self-training PCFG grammars with latent annotations across languages
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Contextual information improves OOV detection in speech
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning simple Wikipedia: a cogitation in ascertaining abecedarian language
CL&W '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
Self-training with products of latent variable grammars
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Lessons learned in part-of-speech tagging of conversational speech
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Generalized interpolation in decision tree LM
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Syntactic decision tree LMs: random selection or intelligent design?
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A fast re-scoring strategy to capture long-distance dependencies
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Revisiting the case for explicit syntactic information in language models
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Hi-index | 0.01 |
We present a scalable joint language model designed to utilize fine-grain syntactic tags. We discuss challenges such a design faces and describe our solutions that scale well to large tagsets and corpora. We advocate the use of relatively simple tags that do not require deep linguistic knowledge of the language but provide more structural information than POS tags and can be derived from automatically generated parse trees - a combination of properties that allows easy adoption of this model for new languages. We propose two fine-grain tagsets and evaluate our model using these tags, as well as POS tags and SuperARV tags in a speech recognition task and discuss future directions.