Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The nature of statistical learning theory
The nature of statistical learning theory
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using unlabeled data to improve text classification
Using unlabeled data to improve text classification
One-class svms for document classification
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Text classification from positive and unlabeled documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A multistrategy approach for digital text categorization from imbalanced documents
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Blocking Reduction Strategies in Hierarchical Text Classification
IEEE Transactions on Knowledge and Data Engineering
An adaptive k-nearest neighbor text categorization strategy
ACM Transactions on Asian Language Information Processing (TALIP)
A novelty detection approach to classification
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Corpus building for corporate knowledge discovery and management: a case study of manufacturing
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Handling Class Imbalance Problems via Weighted BP Algorithm
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
Analytical evaluation of term weighting schemes for text categorization
Pattern Recognition Letters
Information Processing and Management: an International Journal
Expert Systems with Applications: An International Journal
Comparison of metrics for feature selection in imbalanced text classification
Expert Systems with Applications: An International Journal
An iterative voting method based on word density for text classification
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
A semantic term weighting scheme for text categorization
Expert Systems with Applications: An International Journal
Journal of Information Science
Expert Systems with Applications: An International Journal
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Sample cutting method for imbalanced text sentiment classification based on BRC
Knowledge-Based Systems
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Classification and outlier detection based on topic based pattern synthesis
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 12.06 |
The natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We tackle this problem using a simple probability based term weighting scheme to better distinguish documents in minor categories. This new scheme directly utilizes two critical information ratios, i.e. relevance indicators. Such relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study using both Support Vector Machines and Naive Bayes classifiers and extensive comparison with other classic weighting schemes over two benchmarking data sets, including Reuters-21578, shows significant improvement for minor categories, while the performance for major categories are not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed data sets.