A Learning Criterion for Stochastic Rules
Machine Learning - Computational learning theory
Class-based n-gram models of natural language
Computational Linguistics
Selection and information: a class-based approach to lexical relationships
Selection and information: a class-based approach to lexical relationships
Training and scaling preference functions for disambiguation
Computational Linguistics
Improving statistical language model performance with automatically generated word hierarchies
Computational Linguistics
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Inducing Probabilistic Grammars by Bayesian Model Merging
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Generalizing case frames using a thesaurus and the MDL principle
Computational Linguistics
Automatic learning for semantic collocation
ANLC '92 Proceedings of the third conference on Applied natural language processing
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Statistical models for unsupervised prepositional phrase attachment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Structural ambiguity and lexical relations
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Similarity-based estimation of word cooccurrence probabilities
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Generalizing automatically generated selectional patterns
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A rule-based approach to prepositional phrase attachment disambiguation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Clustering words with the MDL principle
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Inducing a semantically annotated lexicon via EM-based clustering
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An unsupervised approach to prepositional phrase attachment using contextually similar words
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A maximum entropy model for prepositional phrase attachment
HLT '94 Proceedings of the workshop on Human Language Technology
Automatic thesaurus construction based on grammatical relations
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Information retrieval based on context distance and morphology
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity
Computational Linguistics
An efficient clustering algorithm for class-based language models
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A generalized framework for revealing analogous themes across related topics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Word Clustering for Collocation-Based Word Sense Disambiguation
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A word clustering approach for language model-based sentence retrieval in question answering systems
Proceedings of the 18th ACM conference on Information and knowledge management
Spectral clustering for Chinese word
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Long distance bigram models applied to word clustering
Pattern Recognition
A nearest-neighbor method for resolving PP-Attachment ambiguity
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A novel neighborhood based document smoothing model for information retrieval
Information Retrieval
Hi-index | 0.00 |
We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and conducting syntactic disambiguation by using the acquired word classes. We view the clustering problem as that of estimating a class-based probability distribution specifying the joint probabilities of word pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability model. Our clustering method is a natural extension of that proposed in Brown, Della Pietra, deSouza, Lai and Mercer (1992). We next propose a syntactic disambiguation method which combines the use of automatically constructed word classes and that of a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 88.2%, which compares favorably against the accuracies obtained by the state-of-the-art disambiguation methods.