Text similarity measurement using concept representation of texts

Authors:
Abhinay Pandya;Pushpak Bhattacharyya
Affiliations:
DA-IICT, Gandhinagar;Dept. of CSE, IIT Bombay
Venue:
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Year:
2005

Citing 3
Cited 0

Using linear algebra for intelligent information retrieval

SIAM Review
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Measuring semantic nearness of documents is important for accurate information retrieval, automated text categorization and classification. Inspired by the observation that text documents contain semantically coherent set of ideas/topics, this paper presents the design and experimental evaluation of a method to represent a text document as a set of concepts. Based on this, we propose a method to measure semantic nearness of texts. Our method makes use of WordNet which is a lexico-semantic network of words. We bypass word sense disambiguation. In order to show the effectiveness of our representation of texts, we compare experimental results of text classification and clustering with the results of classification and clustering with standard techniques.