A semantic term weighting scheme for text categorization

Authors:
Qiming Luo;Enhong Chen;Hui Xiong
Affiliations:
School of Computer Science and Technology, University of Science and Technology of China, China and MOE-MS Key Laboratory of Multimedia Computing and Communication of USTC, China;School of Computer Science and Technology, University of Science and Technology of China, China and MOE-MS Key Laboratory of Multimedia Computing and Communication of USTC, China;Management Science and Information Systems Department, Rutgers University, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 16
Cited 10

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A characterization of wordnet features in Boolean models for text classification

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

The Journal of Machine Learning Research
Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method

Expert Systems with Applications: An International Journal
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text categorization from category name via lexical reference

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Fully Automatic Text Categorization by Exploiting WordNet

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Boosting for text classification with semantic features

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis

Improved multilevel security with latent semantic indexing

Expert Systems with Applications: An International Journal
A semantic similarity method based on information content exploiting multiple ontologies

Expert Systems with Applications: An International Journal
Technology classification with latent semantic indexing

Expert Systems with Applications: An International Journal
Protecting research and technology from espionage

Expert Systems with Applications: An International Journal
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Expert Systems with Applications: An International Journal
Weak signal identification with semantic web mining

Expert Systems with Applications: An International Journal
Language independent semantic kernels for short-text classification

Expert Systems with Applications: An International Journal
Quantitative cross impact analysis with latent semantic indexing

Expert Systems with Applications: An International Journal
Semantic compared cross impact analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies.