A comparative study for wordnet guided text representation

Authors:
Jian Zhang;Chunping Li
Affiliations:
School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China
Venue:
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Year:
2005

Citing 3
Cited 1

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Word sense disambiguation using Conceptual Density

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research

BidTerm Suggestion for Advertising Webpages

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text information processing depends critically on the proper text representation. A common and naïve way of representing a document is a bag of its component words [1], but the semantic relations between words are ignored, such as synonymy and hypernymy-hyponymy between nouns. This paper presents a model for representing a document in terms of the synonymy sets (synsets) in WordNet [2]. The synsets stand for concepts corresponding to the words of the document. The Vector Space Model describes a document as orthogonal term vectors. We replace terms with concepts to build Concept Vector Space Model (CVSM) for the training set. Our experiments on the Reuters Corpus Volume I (RCV1) dataset have shown that the result is satisfactory.