Refinement method of post-processing and training for improvement of automated text classification

Authors:
Yun Jeong Choi;Seung Soo Park
Affiliations:
Department of Computer Science & Engineering, Ewha Womans University, Seoul, Korea;Department of Computer Science & Engineering, Ewha Womans University, Seoul, Korea
Venue:
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
Year:
2006

Citing 8
Cited 1

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Naive Bayesian Classifier Committees

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Athena: Mining-Based Interactive Management of Text Database

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
A multistrategy approach for digital text categorization from imbalanced documents

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A MFoM learning approach to robust multiclass multi-label text categorization

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Efficient classification method for complex biological literature using text and data mining combination

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents a method for improving text classification by using examples that are difficult to classify. Generally, researches to improve the text categorization performance are focused on enhancing existing classification models and algorithms itself, but the range of which has been limited by the feature-based statistical methodology. In this paper, we propose a new method to improve the accuracy and the performance using refinement training and post-processing. Especially, we focused on complex documents that are generally considered to be hard to classify. Our proposed method has a different style from traditional classification methods, and take a data mining strategy and fault tolerant system approaches. In experiments, we applied our system to documents which usually get low classification accuracy because they are laid on a decision boundary. The result shows that our system has high accuracy and stability in actual conditions.