Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A characterization of wordnet features in Boolean models for text classification
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
The Journal of Machine Learning Research
Expert Systems with Applications: An International Journal
Building semantic kernels for text classification using wikipedia
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text categorization from category name via lexical reference
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Fully Automatic Text Categorization by Exploiting WordNet
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Boosting for text classification with semantic features
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Improved multilevel security with latent semantic indexing
Expert Systems with Applications: An International Journal
A semantic similarity method based on information content exploiting multiple ontologies
Expert Systems with Applications: An International Journal
Technology classification with latent semantic indexing
Expert Systems with Applications: An International Journal
Protecting research and technology from espionage
Expert Systems with Applications: An International Journal
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Weak signal identification with semantic web mining
Expert Systems with Applications: An International Journal
Language independent semantic kernels for short-text classification
Expert Systems with Applications: An International Journal
Quantitative cross impact analysis with latent semantic indexing
Expert Systems with Applications: An International Journal
Semantic compared cross impact analysis
Expert Systems with Applications: An International Journal
Hi-index | 12.06 |
Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies.