Inferring decision trees using the minimum description length principle
Information and Computation
Poor estimates of context are worse than none
HLT '90 Proceedings of the workshop on Speech and Natural Language
Elements of information theory
Elements of information theory
A Learning Criterion for Stochastic Rules
Machine Learning - Computational learning theory
Class-based n-gram models of natural language
Computational Linguistics
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Structural ambiguity and lexical relations
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Contextual word similarity and estimation from sparse data
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Automatic thesaurus construction based on grammatical relations
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Class-based probability estimation using a semantic hierarchy
Computational Linguistics
Word clustering and disambiguation based on co-occurrence data
Natural Language Engineering
An empirical assessment of semantic interpretation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proceedings of the 13th international conference on World Wide Web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Hi-index | 0.00 |
We address the problem of automatically constructing a thesaurus by clustering words based on corpus data. We view this problem as that of estimating a joint distribution over the Cartesian product of a partition of a set of nouns and a partition of a set of verbs, and propose a learning algorithm based on the Minimum Description Length (MDL) Principle for such estimation. We empirically compared the performance of our method based on the MDL Principle against the Maximum Likelihood Estimator in word clustering, and found that the former outperforms the latter. We also evaluated the method by conducting pp-attachment disambiguation experiments using an automatically constructed thesaurus. Our experimental results indicate that such a thesaurus can be used to improve accuracy in disambiguation.