Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The automatic indexing system AIR/PHYS - from research to applications
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Trading MIPS and memory for knowledge engineering
Communications of the ACM
Representation and learning in information retrieval
Representation and learning in information retrieval
Automatic indexing based on Bayesian inference networks
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Query expansion using lexical-semantic relations
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English
Communications of the ACM
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Information Retrieval
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Partially supervised clustering for image segmentation
Pattern Recognition
Multiple-prototype classifier design
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms
Journal of Intelligent Information Systems
Evolutionary semi-supervised fuzzy clustering
Pattern Recognition Letters
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles
Journal of the American Society for Information Science and Technology
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
The role of word sense disambiguation in automated text categorization
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Hi-index | 0.01 |
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which prove effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which can learn from both training and test documents to classify new unseen documents. This algorithm is the “Semi-Supervised Fuzzy c-Means” (ssFCM). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 115 TOPICS classes of the Reuters collection. Using the Vector Space Model, each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssFCM improve its performance, effectively addresses the classification of documents into categories with few training documents and does not interfere with the use of training data.