Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Models for retrieval with probabilistic indexing
Information Processing and Management: an International Journal - Modeling data, information and knowledge
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic essay grading using text categorization techniques
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Linear Text Classification Algorithm Based on Category Relevance Factors
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Support Vector Machines Based on a Semantic Kernel for Text Categorization
IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 5 - Volume 5
Length Normalization in Degraded Text Collections
Length Normalization in Degraded Text Collections
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
Semi-supervised single-label text categorization using centroid-based classifiers
Proceedings of the 2007 ACM symposium on Applied computing
Effects of Term Distributions on Binary Classification
IEICE - Transactions on Information and Systems
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Automatic text categorization based on content analysis with cognitive situation models
Information Sciences: an International Journal
Text classification for Thai medicinal web pages
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Experiments on kernel tree support vector machines for text categorization
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Parallel text categorization for multi-dimensional data
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
A Survey of Approaches to Web Service Discovery in Service-Oriented Architectures
Journal of Database Management
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
EasySOC: Making web service outsourcing easier
Information Sciences: an International Journal
Hi-index | 0.00 |
Most of traditional text categorization approaches utilize term frequency (tf) and inverse document frequency (idf) for representing importance of words and/or terms in classifying a text document. This paper describes an approach to apply term distributions, in addition to tf and idf, to improve performance of centroid-based text categorization. Three types of term distributions, called inter-class, intra-class and in-collection distributions, are introduced. These distributions are useful to increase classification accuracy by exploiting information of (1) term distribution among classes, (2) term distribution within a class and (3) term distribution in the whole collection of training data. In addition, this paper investigates how these term distributions contribute to weight each term in documents, e.g., a high term distribution of a word promotes or demotes importance or classification power of that word. To this end, several centroid-based classifiers are constructed with different term weightings. Using various data sets, their performances are investigated and compared to a standard centroid-based classifier (TDIDF) and a centroid-based classifier modified with information gain. Moreover, we also compare them to two well-known methods: k-NN and naïve Bayes. In addition to a unigram model of document representation, a bigram model is also explored. Finally, the effectiveness of term distributions to improve classification accuracy is explored with regard to the training set size and the number of classes.