An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Word sense disambiguation using Conceptual Density
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
BidTerm Suggestion for Advertising Webpages
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Hi-index | 0.00 |
Text information processing depends critically on the proper text representation. A common and naïve way of representing a document is a bag of its component words [1], but the semantic relations between words are ignored, such as synonymy and hypernymy-hyponymy between nouns. This paper presents a model for representing a document in terms of the synonymy sets (synsets) in WordNet [2]. The synsets stand for concepts corresponding to the words of the document. The Vector Space Model describes a document as orthogonal term vectors. We replace terms with concepts to build Concept Vector Space Model (CVSM) for the training set. Our experiments on the Reuters Corpus Volume I (RCV1) dataset have shown that the result is satisfactory.