Building clusters of related words: an unsupervised approach

  • Authors:
  • P. Deepak;Delip Rao;Deepak Khemani

  • Affiliations:
  • Services Innovation Research Center, IBM India Research Lab, Bangalore;Dept of Computer Science and Engineering, IIT Madras, Chennai;Dept of Computer Science and Engineering, IIT Madras, Chennai

  • Venue:
  • PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of finding semantically related words from a text corpus has applications in - to name a few - lexicon induction, word sense disambiguation and information retrieval. The text data in real world, say from the World Wide Web, need not be grammatical. Hence methods relying on parsing or part-of-speech tagging will not perform well in these applications. Further even if the text is grammatically correct, for large corpora, these methods may not scale well. The task of building semantically related sets of words from a corpus of documents and allied problems have been studied extensively in the literature. Most of these techniques rely on the usage of part-of-speech or parse information. In this paper, we explore a less expensive method for finding semantically related words from a corpus without parsing or part-of-speech tagging to address the above problems. This work focuses on building sets of semantically related words from a corpus of documents using traditional data clustering techniques. We examine some key results and possible applications of this work.