Class-based n-gram models of natural language
Computational Linguistics
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Generalizing case frames using a thesaurus and the MDL principle
Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A rule-based approach to prepositional phrase attachment disambiguation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Clustering words with the MDL principle
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Automatic thesaurus construction based on grammatical relations
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Word sense disambiguation in information retrieval revisited
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Word clustering and disambiguation based on co-occurrence data
Natural Language Engineering
Improvements to the Linear Programming Based Scheduling of Web Advertisements
Electronic Commerce Research
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Frequency estimates for statistical word similarity measures
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A transformational-based learner for dependency grammars in discharge summaries
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Using co-composition for acquiring syntactic and semantic subcategorisation
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Two-dimensional clustering for text categorization
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
An efficient clustering algorithm for class-based language models
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Clustering Syntactic Positions with Similar Semantic Requirements
Computational Linguistics
Ontology learning: state of the art and open issues
Information Technology and Management
An algorithm for unsupervised topic discovery from broadcast news stories
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Applications of corpus-based semantic similarity and word segmentation to database schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Using hidden Markov random fields to combine distributional and pattern-based word clustering
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Graph-based word clustering using a web search engine
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Unsupervised methods for determining object and relation synonyms on the web
Journal of Artificial Intelligence Research
Context comparison as a minimum cost flow problem
TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Entity Resolution in Texts Using Statistical Learning and Ontologies
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
A graph-theoretic framework for semantic distance
Computational Linguistics
PAC-Bayesian Analysis of Co-clustering and Beyond
The Journal of Machine Learning Research
On context-aware co-clustering with metadata support
Journal of Intelligent Information Systems
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Wikification via link co-occurrence
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability distribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability distribution. Our method is a natural extension of those proposed in (Brown et al., 1992) and (Li and Abe, 1996), and overcomes their drawbacks while retaining their advantages. We then combined this clustering method with the disambiguation method of (Li and Abe, 1995) to derive a disambiguation method that makes use of both automatically constructed thesauruses and a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 85.2%, which compares favorably against the accuracy (82.4%) obtained by the state-of-the-art disambiguation method of (Brill and Resnik, 1994).