Original Contribution: Stacked generalization
Neural Networks
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Machine Learning
Error reduction through learning multiple descriptions
Machine Learning
Method combination for document filtering
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Text filtering by boosting naive Bayes classifiers
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Active learning using adaptive resampling
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A meta-learning approach for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Multistrategy Learning for Information Extraction
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Combining Multiple Learning Strategies for Effective Cross Validation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
THE WEIGHTED MAJORITY ALGORITHM (Supersedes 89-16)
THE WEIGHTED MAJORITY ALGORITHM (Supersedes 89-16)
Stacked generalization: when does it work?
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
A maximal figure-of-merit learning approach to text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using dragpushing to refine centroid text classifiers
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive sampling for thresholding in document filtering and classification
Information Processing and Management: an International Journal
A novel refinement approach for text categorization
Proceedings of the 14th ACM international conference on Information and knowledge management
Incremental mining of information interest for personalized web scanning
Information Systems
ACM Transactions on Information Systems (TOIS)
Large margin DragPushing strategy for centroid text categorization
Expert Systems with Applications: An International Journal
Using hypothesis margin to boost centroid text classifier
Proceedings of the 2007 ACM symposium on Applied computing
Dynamic category profiling for text filtering and classification
Information Processing and Management: an International Journal
Raising the baseline for high-precision text classifiers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An incremental cluster-based approach to spam filtering
Expert Systems with Applications: An International Journal
Interactive high-quality text classification
Information Processing and Management: an International Journal
Using WordNet to Disambiguate Word Senses for Text Classification
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
An Effective Approach to Enhance Centroid Classifier for Text Categorization
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Document-Base Extraction for Single-Label Text Classification
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Enhancing the Performance of Centroid Classifier by ECOC and Model Refinement
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
A Clustering Framework Based on Adaptive Space Mapping and Rescaling
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Incremental mining of information interest for personalized web scanning
Information Systems
Text classification for healthcare information support
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Automatic categorization of questions for user-interactive question answering
Information Processing and Management: an International Journal
Dynamic category profiling for text filtering and classification
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. This problem has been studied in information retrieval, machine learning and data mining. So far, many effective techniques have been proposed. However, most techniques are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. However, when the data does not fit the model well, the classification accuracy can be very low. In this paper, we propose a refinement approach to dealing with this problem of model misfit. We show that we do not need to change the classification technique itself (or its underlying model) to make it more flexible. Instead, we propose to use successive refinements of classification on the training data to correct the model misfit. We apply the proposed technique to improve the classification performance of two simple and efficient text classifiers, the Rocchio classifier and the naïve Bayesian classifier. These techniques are suitable for very large text collections because they allow the data to reside on disk and need only one scan of the data to build a text classifier. Extensive experiments on two benchmark document corpora show that the proposed technique is able to improve text categorization accuracy of the two techniques dramatically. In particular, our refined model is able to improve the naïve Bayesian or Rocchio classifier's prediction performance by 45% on average.