Supertagging: an approach to almost parsing
Computational Linguistics
New models for improving supertag disambiguation
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Reduced n-gram models for English and Chinese corpora
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Improving dialogue systems in a home automation environment
Proceedings of the 1st international conference on Ambient media and systems
Word Segments in Category-Based Language Models for Automatic Speech Recognition
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
A joint language model with fine-grain syntactic tags
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Segment-based classes for language modeling within the field of CSR
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Pure high-order word dependence mining via information geometry
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Statistical and linguistic clustering for language modeling in ASR
CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Hierarchical finite-state models for speech translation using categorization of phrases
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Mining pure high-order word associations via information geometry for information retrieval
ACM Transactions on Information Systems (TOIS)
Lessons from the journey: a query log analysis of within-session learning
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
A language model based on word-category n-grams and ambiguous category membership with n increased selectively to trade compactness for performance is presented. The use of categories leads intrinsically to a compact model with the ability to generalise to unseen word sequences, and diminishes the sparseness of the training data, thereby making larger n feasible. The language model implicitly involves a statistical tagging operation, which may be used explicitly to assign category assignments to untagged text. Experiments on the LOB corpus show the optimal model-building strategy to yield improved results with respect to conventional n-gram methods, and when used as a tagger, the model is seen to perform well in relation to a standard benchmark.