A semi-supervised feature clustering algorithm with application to word sense disambiguation

Authors:
Zheng-Yu Niu;Dong-Hong Ji;Chew Lim Tan
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;National University of Singapore, Singapore
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 21
Cited 2

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative Double Clustering for Unsupervised and Semi-supervised Learning

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Disambiguating highly ambiguous words

Computational Linguistics - Special issue on word sense disambiguation
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Word translation disambiguation using bilingual bootstrapping

Computational Linguistics
Instance based learning with automatic feature selection applied to word sense disambiguation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Word sense disambiguation by learning from unlabeled data

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Learning word senses with feature selection and order identification capabilities

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Word sense disambiguation using label propagation based semi-supervised learning

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Macro features based text categorization

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate an application of feature clustering for word sense disambiguation, and propose a semisupervised feature clustering algorithm. Compared with other feature clustering methods (ex. supervised feature clustering), it can infer the distribution of class labels over (unseen) features unavailable in training data (labeled data) by the use of the distribution of class labels over (seen) features available in training data. Thus, it can deal with both seen and unseen features in feature clustering process. Our experimental results show that feature clustering can aggressively reduce the dimensionality of feature space, while still maintaining state of the art sense disambiguation accuracy. Furthermore, when combined with a semi-supervised WSD algorithm, semi-supervised feature clustering outperforms other dimensionality reduction techniques, which indicates that using unlabeled data in learning process helps to improve the performance of feature clustering and sense disambiguation.