Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of search term weighting: term relevance vs. inverse document frequency
SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Discovering corpus-specific word senses
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Unsupervised methods for developing taxonomies by combining syntactic and statistical information
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A comparative analysis on the bisecting K-means and the PDDP clustering algorithms
Intelligent Data Analysis
Hi-index | 0.00 |
The task of finding semantically related words from a text corpus has applications in - to name a few - lexicon induction, word sense disambiguation and information retrieval. The text data in real world, say from the World Wide Web, need not be grammatical. Hence methods relying on parsing or part-of-speech tagging will not perform well in these applications. Further even if the text is grammatically correct, for large corpora, these methods may not scale well. The task of building semantically related sets of words from a corpus of documents and allied problems have been studied extensively in the literature. Most of these techniques rely on the usage of part-of-speech or parse information. In this paper, we explore a less expensive method for finding semantically related words from a corpus without parsing or part-of-speech tagging to address the above problems. This work focuses on building sets of semantically related words from a corpus of documents using traditional data clustering techniques. We examine some key results and possible applications of this work.