Class-based n-gram models of natural language
Computational Linguistics
Tagging English text with a probabilistic model
Computational Linguistics
Does Baum-Welch re-estimation help taggers?
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Distributional part-of-speech tagging
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Part-of-speech induction from scratch
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Inducing syntactic categories by context distribution clustering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised induction of stochastic context-free grammars using distributional clustering
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Lexical acquisition and clustering of word senses to conceptual lexicon construction
Computers & Mathematics with Applications
A comparison of co-occurrence and similarity measures as simulations of context
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
A novel approach for biclustering gene expression data using modular singular value decomposition
CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
Journal of Artificial Intelligence Research
PAC-Bayesian Analysis of Co-clustering and Beyond
The Journal of Machine Learning Research
Hi-index | 0.00 |
We describe work in progress aimed at developing methods for automatically constructing a lexicon using only statistical data derived from analysis of corpora, a problem we call lexical optimization. Specifically, we use statistical methods alone to obtain information equivalent to syntactic categories, and to discover the semantically meaningful units of text, which may be multi-word units or polysemous terms-in-context. Our guiding principle is to employ a notion of "meaningfulness" that can be quantified information-theoretically, so that plausible variants of a lexicon can be judged relative to each other. We describe a technique of this nature called information theoretic co-clustering and give results of a series of experiments built around it that demonstrate the main ingredients of lexical optimization. We conclude by describing our plans for further improvements, and for applying the same mathematical principles to other problems in natural language processing.