Self-organized language modeling for speech recognition
Readings in speech recognition
Experience with a stack decoder-based HMM CSR and back-OFF N-gram language models
HLT '91 Proceedings of the workshop on Speech and Natural Language
Class-based n-gram models of natural language
Computational Linguistics
Towards history-based grammars: using richer models for probabilistic parsing
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Contextual word similarity and estimation from sparse data
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Stochastic lexicalized tree-adjoining grammars
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Improvements in stochastic language modeling
HLT '91 Proceedings of the workshop on Speech and Natural Language
Smoothing of automatically generated selectional constraints
HLT '93 Proceedings of the workshop on Human Language Technology
Improving statistical language model performance with automatically generated word hierarchies
Computational Linguistics
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining Text Using Keyword Distributions
Journal of Intelligent Information Systems
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Task adaptation in stochastic language model for Chinese homophone disambiguation
ACM Transactions on Asian Language Information Processing (TALIP)
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Generalizing case frames using a thesaurus and the MDL principle
Computational Linguistics
Word clustering and disambiguation based on co-occurrence data
Natural Language Engineering
Verb sense disambiguation based on dual distributional similarity
Natural Language Engineering
Finding a domain-appropriate sense inventory for semantically tagging a corpus
Natural Language Engineering
Automatic selection of class labels from a thesaurus for an effective semantic tagging of corpora
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Similarity-based methods for word sense disambiguation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Memory-based learning: using similarity for smoothing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
MindNet: acquiring and structuring semantic information from text
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Statistical sense disambiguation with relatively small corpora using dictionary definitions
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Redefining similarity in a thesaurus by using corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Self-organizing Chinese and Japanese semantic maps
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Computational Linguistics
Neural Networks - 2004 Special issue: New developments in self-organizing systems
An HMM approach to vowel restoration in Arabic and Hebrew
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Automatic acquisition for sensibility knowledge using co-occurrence relation
International Journal of Computer Applications in Technology
Unsupervised type and token identification of idiomatic expressions
Computational Linguistics
A survey on sentiment detection of reviews
Expert Systems with Applications: An International Journal
Query reformulation using anchor text
Proceedings of the third ACM international conference on Web search and data mining
Probabilistic logic with minimum perplexity: Application to language modeling
Pattern Recognition
Semantic similarity measure of polish nouns based on linguistic features
BIS'07 Proceedings of the 10th international conference on Business information systems
A Bayesian method for robust estimation of distributional similarities
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A web-based novel term similarity framework for ontology learning
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Data driven approaches to speech and language processing
Nonlinear Speech Modeling and Applications
Discovering links among social networks
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on "most similar" words.We describe a probabilistic word association model based on distributional word similarity, and apply it to improving probability estimates for unseen word bigrams in a variant of Katz's back-off model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error.