An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Naive Bayesian Classifier Committees
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Athena: Mining-Based Interactive Management of Text Database
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
A multistrategy approach for digital text categorization from imbalanced documents
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A MFoM learning approach to robust multiclass multi-label text categorization
ICML '04 Proceedings of the twenty-first international conference on Machine learning
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
The paper presents a method for improving text classification by using examples that are difficult to classify. Generally, researches to improve the text categorization performance are focused on enhancing existing classification models and algorithms itself, but the range of which has been limited by the feature-based statistical methodology. In this paper, we propose a new method to improve the accuracy and the performance using refinement training and post-processing. Especially, we focused on complex documents that are generally considered to be hard to classify. Our proposed method has a different style from traditional classification methods, and take a data mining strategy and fault tolerant system approaches. In experiments, we applied our system to documents which usually get low classification accuracy because they are laid on a decision boundary. The result shows that our system has high accuracy and stability in actual conditions.